我有以下df
AAA BBB CCC DDD ID1 ID2 ID3 ID4
0 txt txt txt txt 10 NaN 12 NaN
1 txt txt txt txt 10 NaN 12 13
2 txt txt txt txt NaN NaN NaN NaN
具有以下dtypes
AAA object
BBB object
CCC object
DDD object
ID1 float64
ID2 float64
ID3 float64
ID4 float64
是否只有在所有浮点数均为NaN时才删除行?
输出:
AAA BBB CCC DDD ID1 ID2 ID3 ID4
0 txt txt txt txt 10 NaN 12 NaN
1 txt txt txt txt 10 NaN 12 13
我无法使用df.dropna(subset = ['ID1','ID2','ID3','ID4'])完成此操作,因为我的实际df有多个动态浮动列。
谢谢
答案 0 :(得分:3)
使用DataFrame.select_dtypes
获取所有浮点列,然后测试非缺失值,并按DataFrame.any
选择每行至少一个非错误值-这样就删除了错误的浮点行:
df1 = df[df.select_dtypes(float).notna().any(axis=1)]
print (df1)
AAA BBB CCC DDD ID1 ID2 ID3 ID4
0 txt txt txt txt 10.0 NaN 12.0 NaN
1 txt txt txt txt 10.0 NaN 12.0 13.0
您应将DataFrame.dropna
的解决方案更改为传递浮点数列,并更改参数how='all'
以测试每行是否所有NaN
:
df1 = df.dropna(subset=df.select_dtypes(float).columns, how='all')
#for return same dataframe
#df.dropna(subset=df.select_dtypes(float).columns, how='all', inplace=True)
如果可能,可以通过np.floating
检查多种浮点数:
df1 = df.dropna(subset=df.select_dtypes(np.floating).columns, how='all')
答案 1 :(得分:1)
使用
df.dropna(subset=df.select_dtypes(include=np.number).columns, how='all')
我建议使用include=np.number
,因为它包含所有float
dtypes-它们都可能包含NaN
。使用include=float
时,您仅获得标准的npfloat64
dtype
例如:
df['ID5'] = np.array([1,2,np.nan], dtype=np.float16)
>>> df.select_dtypes(include=float).columns.tolist()
['ID1', 'ID2', 'ID3', 'ID4']
>>> df.select_dtypes(include=np.number).columns.tolist()
['ID1', 'ID2', 'ID3', 'ID4', 'ID5']
答案 2 :(得分:0)
您可以将NaN
替换为0
,然后删除仅包含NaN
的那些列
df.loc[:,~df.replace(0,np.nan).isna().all()]