我正在尝试使用扩展窗口功能来计算时间索引数据帧上的统计信息。
date_range=pd.date_range('2015-01-01','2019-12-31')
df=pd.DataFrame(np.random.rand(len(date_range)),index=date_range,columns=['X'])
df['X'].expanding(min_periods=1).apply(lambda x: np.nanpercentile(x,50))
如何为每个日期汇总不同的百分位数以获得类似的信息
50Th_percentile 60Th_percentile
2015-01-01 0.373176 0.373176
2015-01-02 0.601829 0.647559
2015-01-03 0.373176 0.464637
2015-01-04 0.335774 0.358215
如下所示不起作用
df['X'].expanding(min_periods=1).apply(lambda x:
x.agg[np.nanpercentile(x.values,50),np.nanpercentile(x.values,60)])
答案 0 :(得分:2)
使用:
Set r = ListObjects(1).DataBodyRange
要添加更多:
(df['X'].expanding(min_periods=1).agg({'50':lambda x: np.nanpercentile(x,50),
'60':lambda x: np.nanpercentile(x,60)})
.add_suffix('_percentile'))
50_percentile 60_percentile
2015-01-01 0.520419 0.520419
2015-01-02 0.726755 0.768022
2015-01-03 0.547734 0.624805
2015-01-04 0.597529 0.627407
2015-01-05 0.547734 0.587570
... ... ...
2019-12-27 0.485802 0.591071
2019-12-28 0.485946 0.590530
2019-12-29 0.486197 0.591342
2019-12-30 0.486448 0.591684
2019-12-31 0.486467 0.591791
[1826 rows x 2 columns]
list_percentiles = [50,60,70,80,90]
agg_percentiles = dict(zip(map(str,list_percentiles),
map(lambda val: lambda x: np.nanpercentile(x,val),
list_percentiles)
)
)
df['X'].expanding(min_periods=1).agg(agg_percentiles).add_suffix('_percentiles')
或列表理解
50_percentiles 60_percentiles 70_percentiles 80_percentiles \
2015-01-01 0.520419 0.520419 0.520419 0.520419
2015-01-02 0.726755 0.768022 0.809290 0.850557
2015-01-03 0.547734 0.624805 0.701877 0.778948
2015-01-04 0.597529 0.627407 0.675902 0.761632
2015-01-05 0.547734 0.587570 0.627407 0.704478
... ... ... ... ...
2019-12-27 0.485802 0.591071 0.691039 0.796903
2019-12-28 0.485946 0.590530 0.690891 0.796819
2019-12-29 0.486197 0.591342 0.691197 0.796735
2019-12-30 0.486448 0.591684 0.691088 0.796652
2019-12-31 0.486467 0.591791 0.691237 0.796568
90_percentiles
2015-01-01 0.520419
2015-01-02 0.891824
2015-01-03 0.856020
2015-01-04 0.847361
2015-01-05 0.818785
... ...
2019-12-27 0.899027
2019-12-28 0.898999
2019-12-29 0.898971
2019-12-30 0.898943
2019-12-31 0.898916
[1826 rows x 5 columns]