我有数据框-
dfx = pd.DataFrame({
'city': ['city1', 'city2', 'city3', 'city4'],
'state':['state1', 'state2', 'state3', 'state4'],
2005: [144, 205, 123, np.NaN],
2006: [173, 211, 123, 124],
2007: [np.NaN, np.NaN, np.NaN,np.NaN],
2008: [np.NaN, 206, np.NaN,np.NaN],
2009: [np.NaN, np.NaN, 124, 123],
2010: [128, 273, np.NaN, np.NaN]
})
print(dfx)
我想创建一个具有3个或更多NaN值的行的新数据框。
答案 0 :(得分:2)
您可以用DataFrame.isna
测试缺失值,并用True
计算sum
的值,最后用Series.ge
过滤boolean indexing
中的值是否等于:< / p>
df = dfx[dfx.isna().sum(axis=1).ge(3)]
#if need omit counts for first 2 columns
#df = dfx[dfx.iloc[:, 2:].isna().sum(axis=1).ge(3)]
print (df)
city state 2005 2006 2007 2008 2009 2010
0 city1 state1 144.0 173 NaN NaN NaN 128.0
2 city3 state3 123.0 123 NaN NaN 124.0 NaN
3 city4 state4 NaN 124 NaN NaN 123.0 NaN
详细信息:
print (dfx.isna().sum(axis=1))
0 3
1 2
2 3
3 4
dtype: int64