我使用:
bins = pd.cut(data['R10rank'], list(np.arange(0.0, 1.1, 0.1)))
sum=data.groupby(bins)['Ret20d'].agg(['count', 'mean'])
创建如下统计信息:
count mean
R10rank
(0.0, 0.1] 1044 4.782833
(0.1, 0.2] 809 5.527745
(0.2, 0.3] 746 5.181306
(0.3, 0.4] 706 4.034747
(0.4, 0.5] 627 3.119654
(0.5, 0.6] 585 1.977387
(0.6, 0.7] 609 -0.602742
(0.7, 0.8] 493 -2.745312
(0.8, 0.9] 412 -2.476791
(0.9, 1.0] 374 -6.364374
接下来,我想查看可以汇总不同值间隔内的统计信息的垃圾箱。
赞:
<0.1
<0.3
<0.5
>0.5
>0.7
etc
因此,第二行将包含R10rank中所有值为0-3的值的计数和均值。第四行将为R10rank中所有值> 0.5
的值创建计数和均值我也可以使用pd.cut吗?如果没有,哪种方法更简单?
谢谢。
答案 0 :(得分:0)
您可以使用expanding
df['New']=df['count']*df['mean']
df.expanding(min_periods=1).sum().assign(mean=lambda x : x['New']/x['count'])
Out[105]:
count mean New
R10rank
(0.0,0.1] 1044.0 4.782833 4993.277652
(0.1,0.2] 1853.0 5.108054 9465.223357
(0.2,0.3] 2599.0 5.129080 13330.477633
(0.3,0.4] 3305.0 4.895313 16179.009015
(0.4,0.5] 3932.0 4.612165 18135.032073
(0.5,0.6] 4517.0 4.270933 19291.803468
(0.6,0.7] 5126.0 3.691911 18924.733590
(0.7,0.8] 5619.0 3.127121 17571.294774
(0.8,0.9] 6031.0 2.744297 16550.856882
(0.9,1.0] 6405.0 2.212425 14170.581006