Pandas:IndexingError:作为索引器提供的Unalignable boolean Series

时间:2017-07-27 13:55:13

标签: python pandas

我试图运行我认为简单的代码来消除所有NaN的列,但无法使其工作(axis = 1在删除行时工作正常):< / p>

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]})

df = df[df.notnull().any(axis = 0)]

print df

完整错误:

raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

预期产出:

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

3 个答案:

答案 0 :(得分:11)

您需要loc,因为按列过滤:

print (df.notnull().any(axis = 0))
a     True
b     True
c     True
d    False
dtype: bool

df = df.loc[:, df.notnull().any(axis = 0)]
print (df)

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

或过滤列,然后按[]选择:

print (df.columns[df.notnull().any(axis = 0)])
Index(['a', 'b', 'c'], dtype='object')

df = df[df.columns[df.notnull().any(axis = 0)]]
print (df)

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

dropna带有参数how='all',用于删除仅由NaN填充的所有列:

print (df.dropna(axis=1, how='all'))
     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

答案 1 :(得分:2)

您可以将dropnaaxis=1thresh=1

一起使用
In[19]:
df.dropna(axis=1, thresh=1)

Out[19]: 
     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

这将删除任何不具有至少1个非NaN值的列,这意味着任何具有所有NaN的列都将被删除

您尝试失败的原因是因为布尔掩码:

In[20]:
df.notnull().any(axis = 0)

Out[20]: 
a     True
b     True
c     True
d    False
dtype: bool

无法在默认情况下使用的索引上对齐,因为这会在列上生成布尔值掩码

答案 2 :(得分:0)

我来这里是因为我试图像这样过滤第一个 2 个字母:

filtered = df[(df.Name[0:2] != 'xx')] 

修复是:

filtered = df[(df.Name.str[0:2] != 'xx')]