我试图运行我认为简单的代码来消除所有NaN的列,但无法使其工作(axis = 1
在删除行时工作正常):< / p>
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]})
df = df[df.notnull().any(axis = 0)]
print df
完整错误:
raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
预期产出:
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
答案 0 :(得分:11)
您需要loc
,因为按列过滤:
print (df.notnull().any(axis = 0))
a True
b True
c True
d False
dtype: bool
df = df.loc[:, df.notnull().any(axis = 0)]
print (df)
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
或过滤列,然后按[]
选择:
print (df.columns[df.notnull().any(axis = 0)])
Index(['a', 'b', 'c'], dtype='object')
df = df[df.columns[df.notnull().any(axis = 0)]]
print (df)
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
或dropna
带有参数how='all'
,用于删除仅由NaN
填充的所有列:
print (df.dropna(axis=1, how='all'))
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
答案 1 :(得分:2)
您可以将dropna
与axis=1
和thresh=1
:
In[19]:
df.dropna(axis=1, thresh=1)
Out[19]:
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
这将删除任何不具有至少1个非NaN值的列,这意味着任何具有所有NaN
的列都将被删除
您尝试失败的原因是因为布尔掩码:
In[20]:
df.notnull().any(axis = 0)
Out[20]:
a True
b True
c True
d False
dtype: bool
无法在默认情况下使用的索引上对齐,因为这会在列上生成布尔值掩码
答案 2 :(得分:0)
我来这里是因为我试图像这样过滤第一个 2 个字母:
filtered = df[(df.Name[0:2] != 'xx')]
修复是:
filtered = df[(df.Name.str[0:2] != 'xx')]