python中是否有一个函数允许我计算数组中非缺失值的数量?
我的数据:
df.wealth1[df.wealth < 25000] = df.wealth
df.wealth2[df.wealth <50000 & df.wealth > 25000] = df.wealth
df.wealth3[df.wealth < 75000 & df.wealth > 50000] = df.wealth
...
id, income, wealth, wealth1, wealth2, ... wealth9
1, 100000, 20000, 20000, ,...,
2, 60000, 40000, , 40000, ...,
3 70000, 23000, 23000, , ...,
4 80000, 75000, , ,..., 75000
...
我目前的情况:
income_brackets = [(0, 25000), (25000,50000), (50000,100000)]
source = {'wealth1': [], 'wealth2' :[], .... 'wealth9' : []
for lower, upper in income_brackets:
for key in source:
source[key].append(len(df.query('income > {} and income < {}'.format(lower,upper))[np.logical_not(np.isnan([key]))]))
但这不起作用,因为np.isnan('wealth1')
无效。它只适用于np.isnan(df.wealth1)
,但我无法将其合并到我的for循环中。我对python很新,所以也许(希望)我错过了一些明显的东西。
任何建议或问题都会很棒。谢谢!干杯
答案 0 :(得分:2)
执行此操作的最佳方法是使用count
对象的DataFrame
方法:
In [18]: data = randn(1000, 3)
In [19]: data
Out[19]:
array([[ 0.1035, 0.9239, 0.3902],
[ 0.2022, -0.1755, -0.4633],
[ 0.0595, -1.3779, -1.1187],
...,
[ 1.3931, 0.4087, 2.348 ],
[ 1.2746, -0.6431, 0.0707],
[-1.1062, 1.3949, 0.3065]])
In [20]: data[rand(len(data)) > 0.5] = nan
In [21]: data
Out[21]:
array([[ 0.1035, 0.9239, 0.3902],
[ 0.2022, -0.1755, -0.4633],
[ nan, nan, nan],
...,
[ 1.3931, 0.4087, 2.348 ],
[ 1.2746, -0.6431, 0.0707],
[-1.1062, 1.3949, 0.3065]])
In [22]: df = DataFrame(data, columns=list('abc'))
In [23]: df.head()
Out[23]:
a b c
0 0.1035 0.9239 0.3902
1 0.2022 -0.1755 -0.4633
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
[5 rows x 3 columns]
In [24]: df.count()
Out[24]:
a 498
b 498
c 498
dtype: int64
In [26]: df.notnull().sum()
Out[26]:
a 498
b 498
c 498
dtype: int64
与许多pandas方法一样,这也适用于Series
个对象:
In [27]: df.a.count()
Out[27]: 498
答案 1 :(得分:0)
Pandas允许您以下列方式访问列:
np.isnan(df['wealth1'])
顺便说一下,即使不是这种情况,你仍然可以做到
np.isnan(getattr(df, 'wealth1'))