在滚动和功能中获取先前分组的最后一个值?熊猫Python

时间:2015-11-02 13:14:47

标签: python datetime pandas dataframe

我正在尝试编写一个函数,该函数将根据滚动窗口上的特定索引求和/平均值。

我的数据如下所示:

Date (L0)   Date - (L1) Value   4-Period-L0-Sum 
12/31/2011  1/25/2012   1321    
3/31/2012   4/25/2012   1457    
6/30/2012   7/25/2012   2056    
9/30/2012   10/26/2012  3461    8295
12/31/2012  1/24/2013   2317    9291
3/31/2013   4/24/2013   2008    9842
6/30/2013   7/24/2013   1885    9671
6/30/2013   7/27/2013   1600    9386
9/30/2013   10/29/2013  1955    7880
9/30/2013   11/1/2013   1400    7325
12/31/2013  1/28/2014   1985    6993
12/31/2013  1/30/2014   1985    6993
3/31/2014   4/24/2014   1382    6367
3/31/2014   4/25/2014   1200    6185
6/30/2014   7/23/2014   2378    6963
9/30/2014   10/21/2014  3826    9389
3/31/2015   4/28/2015   2369    9773
3/31/2015   4/30/2015   2369    9773

我正在尝试生成类似pd.rolling_sum(dataframe,window = 4)的内容,除非根据level = 0索引(Date(L0))并使用先前level = 0索引条目中的最后一个值。例如,要计算期间的滚动总和,

[3/31/2014  4/24/2014] = 1382 + 1985 + 1400 + 1600

我的解决方案是使用扩展窗口,groupby 0级,然后取尾和总和:

def custom_sum(datadf, period):            
    idx_range = np.arange(n)       
    mm = period * 2 + 4   
    tmpdf = pd.concat(
                map(lambda i:
                    pd.DataFrame( datadf.iloc[ :i], ].
                                 groupby(level=0,axis=0).tail(1).tail(period).
                                 sum(skipna=False) 
                                ).T
                    , idx_range[period:] ))
    tmpdf.index = datadf.index[period-1:]
    return tmpdf

虽然它运行得很慢。我相信一定有更好的方法。

一种方法可能是使用pd.exanding_apply(),但它并没有保留数据框以应用该函数,因此没有办法获得正确的groupyby索引..

谢谢!

1 个答案:

答案 0 :(得分:1)

您可以使用groupby,如下所示:

import pandas as pd

text = """DateL1   DateL2 Value   Sum 
12/31/2011  1/25/2012   1321    
3/31/2012   4/25/2012   1457    
6/30/2012   7/25/2012   2056    
9/30/2012   10/26/2012  3461    8295
12/31/2012  1/24/2013   2317    9291
3/31/2013   4/24/2013   2008    9842
6/30/2013   7/24/2013   1885    9671
6/30/2013   7/27/2013   1600    9386
9/30/2013   10/29/2013  1955    7880
9/30/2013   11/1/2013   1400    7325
12/31/2013  1/28/2014   1985    6993
12/31/2013  1/30/2014   1985    6993
3/31/2014   4/24/2014   1382    6367
3/31/2014   4/25/2014   1200    6185
6/30/2014   7/23/2014   2378    6963
9/30/2014   10/21/2014  3826    9389
3/31/2015   4/28/2015   2369    9773
3/31/2015   4/30/2015   2369    9773"""

from io import BytesIO

df = pd.read_csv(BytesIO(text), delim_whitespace=True, parse_dates=[0], index_col=0)
s1 = pd.rolling_sum(df.groupby(df.index, sort=False).Value.last(), 4)

def f(s):
  return s - s.iat[-1]

s2 = df.groupby(df.index, sort=False).Value.transform(f).fillna(0)

print s1 + s2

这是输出:

DateL1
2011-12-31     NaN
2012-03-31     NaN
2012-06-30     NaN
2012-09-30    8295
2012-12-31    9291
2013-03-31    9842
2013-06-30    9671
2013-06-30    9386
2013-09-30    7880
2013-09-30    7325
2013-12-31    6993
2013-12-31    6993
2014-03-31    6367
2014-03-31    6185
2014-06-30    6963
2014-09-30    9389
2015-03-31    9773
2015-03-31    9773
dtype: float64