Pandas groupby结果出乎意料

时间:2015-04-12 22:43:07

标签: python pandas group-by

我有一个“堆叠”格式的时间序列数据,并希望根据两列计算滚动功能。但是,如下面的示例所示,groupby是水平连接我的结果而不是垂直连接。我可以在最后申请stack以恢复高格式。但是,我认为正确的行为应该是垂直连接以允许分配回原始数据帧(类似x['res'] = df.groupby(...).apply(func))。有谁知道为什么groupby没有达到预期的效果,或者我做错了什么?

x
Out[52]: 
    group      month         a         b
0   18527 2014-09-01  0.534152  0.973451
1   18527 2014-10-01  0.079879  0.354498
2   18527 2014-11-01  0.032298  0.203997
3   18527 2014-12-01  0.148435  0.352703
4   18527 2015-01-01  0.879930  0.819328
5   18527 2015-02-01  0.475297  0.693203
6   18527 2015-03-01  0.223759  0.731594
7   18527 2015-04-01  0.391933  0.332801
8   18671 2014-09-01  0.740621  0.305298
9   18671 2014-10-01  0.230585  0.772569
10  18671 2014-11-01  0.664834  0.755219
11  18671 2014-12-01  0.987118  0.896310
12  18671 2015-01-01  0.228804  0.058641
13  18671 2015-02-01  0.415715  0.182683
14  18671 2015-03-01  0.574570  0.144686
15  18671 2015-04-01  0.488804  0.545102

x.dtypes
Out[53]: 
group             int64
month    datetime64[ns]
a               float64
b               float64
dtype: object

def func(s):
    return pd.rolling_sum(s.a, 3) / pd.rolling_sum(s.b, 3)


x.set_index('month').groupby('group').apply(func)
Out[55]: 
month  2014-09-01  2014-10-01  2014-11-01  2014-12-01  2015-01-01  2015-02-01  group                                                                           
18527         NaN         NaN    0.421900    0.286010    0.770814    0.806152   
18671         NaN         NaN    0.892505    0.776593    1.099748    1.434238   

month  2015-03-01  2015-04-01  
group                          
18527    0.703609    0.620728  
18671    3.158185    1.695287  

x.set_index('month').groupby('group').apply(func).stack()
Out[56]: 
group  month     
18527  2014-11-01    0.421900
       2014-12-01    0.286010
       2015-01-01    0.770814
       2015-02-01    0.806152
       2015-03-01    0.703609
       2015-04-01    0.620728
18671  2014-11-01    0.892505
       2014-12-01    0.776593
       2015-01-01    1.099748
       2015-02-01    1.434238
       2015-03-01    3.158185
       2015-04-01    1.695287
dtype: float64

1 个答案:

答案 0 :(得分:1)

您可以将结果转换为func()中的数据框:

def func(s):
    return (pd.rolling_sum(s.a, 3) / pd.rolling_sum(s.b, 3)).dropna().to_frame()

df.groupby('group').apply(func)