Python / Pandas:确定系列中的价值计数并报告。还报告“ nan”计数

时间:2018-07-07 21:09:32

标签: python pandas series

在以下代码中,我想计算特定值在“值”列中出现的次数,并在“ Count_Non_Null”列中报告该次数。同样,我想统计出现null(np.nan)的次数,并在“ Count_Nulls”列中报告。

在下面的示例中,值“ NFLX”出现一次,“ FB”出现2次,“ MSFT”出现3次,等等。np.nan也出现4次。

目的是生成类似于图像中给出的输出。Image of how the report is expected to appear

list

1 个答案:

答案 0 :(得分:1)

尝试将transformsize一起使用以获取非nan值的计数,然后对nan求和,将loc分配给nan行。

设置

import pandas as pd
import numpy as np

data = { 
    'Value': [
        'NFLX','FB','GOOG','VZ',np.nan,'MSFT','AMZN',
        np.nan,'MSFT',np.nan,'MSFT','INTC','AAPL',
        np.nan,'AMZN','FB'
    ]
}

df = pd.DataFrame(data) # no need for 'columns' argument

调用大小转换并添加NaNs

df = df.assign(
    Count_Non_Null=df.groupby('Value')['Value'].transform('size'), # call .fillna(False) here if you need it
    Count_Nulls=np.nan # You can also use False here
)

df.loc[pd.isnull(df['Value']), 'Count_Nulls'] = pd.isnull(df['Value']).sum()

结果

>>> df
   Value  Count_Non_Null  Count_Nulls
0   NFLX             1.0          NaN
1     FB             2.0          NaN
2   GOOG             1.0          NaN
3     VZ             1.0          NaN
4    NaN             NaN          4.0
5   MSFT             3.0          NaN
6   AMZN             2.0          NaN
7    NaN             NaN          4.0
8   MSFT             3.0          NaN
9    NaN             NaN          4.0
10  MSFT             3.0          NaN
11  INTC             1.0          NaN
12  AAPL             1.0          NaN
13   NaN             NaN          4.0
14  AMZN             2.0          NaN
15    FB             2.0          NaN