在以下代码中,我想计算特定值在“值”列中出现的次数,并在“ Count_Non_Null”列中报告该次数。同样,我想统计出现null(np.nan)的次数,并在“ Count_Nulls”列中报告。
在下面的示例中,值“ NFLX”出现一次,“ FB”出现2次,“ MSFT”出现3次,等等。np.nan也出现4次。
目的是生成类似于图像中给出的输出。Image of how the report is expected to appear
list
答案 0 :(得分:1)
尝试将transform
与size
一起使用以获取非nan值的计数,然后对nan求和,将loc
分配给nan行。
设置
import pandas as pd
import numpy as np
data = {
'Value': [
'NFLX','FB','GOOG','VZ',np.nan,'MSFT','AMZN',
np.nan,'MSFT',np.nan,'MSFT','INTC','AAPL',
np.nan,'AMZN','FB'
]
}
df = pd.DataFrame(data) # no need for 'columns' argument
调用大小转换并添加NaNs
df = df.assign(
Count_Non_Null=df.groupby('Value')['Value'].transform('size'), # call .fillna(False) here if you need it
Count_Nulls=np.nan # You can also use False here
)
df.loc[pd.isnull(df['Value']), 'Count_Nulls'] = pd.isnull(df['Value']).sum()
结果
>>> df
Value Count_Non_Null Count_Nulls
0 NFLX 1.0 NaN
1 FB 2.0 NaN
2 GOOG 1.0 NaN
3 VZ 1.0 NaN
4 NaN NaN 4.0
5 MSFT 3.0 NaN
6 AMZN 2.0 NaN
7 NaN NaN 4.0
8 MSFT 3.0 NaN
9 NaN NaN 4.0
10 MSFT 3.0 NaN
11 INTC 1.0 NaN
12 AAPL 1.0 NaN
13 NaN NaN 4.0
14 AMZN 2.0 NaN
15 FB 2.0 NaN