如何在python的熊猫中完成一个滚动的每周最新总和

时间:2017-11-25 21:14:42

标签: python pandas

假设我有以下代码:

import pandas as pd
frame = pd.DataFrame(
    {
     "Date":["11/1/2017","11/2/2017","11/3/2017","11/4/2017",
             "11/5/2017","11/6/2017","11/7/2017","11/8/2017","11/9/2017","11/10/2017","11/11/2017",
             "11/12/2017"],
     "Day":["Wed","Thr","Fri","Sat",
            "Sun","Mon","Tue","Wed","Thr","Fri","Sat",
            "Sun"],
     "Sales":[5,5,10,5,
              5,10,5,5,15,5,0,
              5]
     })
frame.reset_index()
print(frame)
print()

frame['RollingSum']=frame["Sales"].rolling(window=7, min_periods = 1, win_type='boxcar').sum()

print(frame)

打印以下输出:

          Date  Day  Sales
0    11/1/2017  Wed      5
1    11/2/2017  Thr      5
2    11/3/2017  Fri     10
3    11/4/2017  Sat      5
4    11/5/2017  Sun      5
5    11/6/2017  Mon     10
6    11/7/2017  Tue      5
7    11/8/2017  Wed      5
8    11/9/2017  Thr     15
9   11/10/2017  Fri      5
10  11/11/2017  Sat      0
11  11/12/2017  Sun      5

       Date  Day  Sales  RollingSum
0    11/1/2017  Wed      5         5.0
1    11/2/2017  Thr      5        10.0
2    11/3/2017  Fri     10        20.0
3    11/4/2017  Sat      5        25.0
4    11/5/2017  Sun      5        30.0
5    11/6/2017  Mon     10        40.0
6    11/7/2017  Tue      5        45.0
7    11/8/2017  Wed      5        45.0
8    11/9/2017  Thr     15        55.0
9   11/10/2017  Fri      5        50.0
10  11/11/2017  Sat      0        45.0
11  11/12/2017  Sun      5        45.0

现在假设不是平坦的7天窗口,我希望左端点是最近的星期日(或数据集中的第一天)......我怎么能实现这一点?期望的输出:

          Date  Day  Sales      WTDSum
0    11/1/2017  Wed      5         5.0
1    11/2/2017  Thr      5        10.0
2    11/3/2017  Fri     10        20.0
3    11/4/2017  Sat      5        25.0
4    11/5/2017  Sun      5         5.0
5    11/6/2017  Mon     10        15.0
6    11/7/2017  Tue      5        20.0
7    11/8/2017  Wed      5        25.0
8    11/9/2017  Thr     15        40.0
9   11/10/2017  Fri      5        45.0
10  11/11/2017  Sat      0        45.0
11  11/12/2017  Sun      5         5.0

1 个答案:

答案 0 :(得分:3)

我们在小组和小组计算中使用cumsum两次

df['WTDSum']=df.groupby(df.Day.eq('Sun').cumsum()).Sales.cumsum()
df
Out[520]: 
          Date  Day  Sales  WTDSum
0    11/1/2017  Wed      5       5
1    11/2/2017  Thr      5      10
2    11/3/2017  Fri     10      20
3    11/4/2017  Sat      5      25
4    11/5/2017  Sun      5       5
5    11/6/2017  Mon     10      15
6    11/7/2017  Tue      5      20
7    11/8/2017  Wed      5      25
8    11/9/2017  Thr     15      40
9   11/10/2017  Fri      5      45
10  11/11/2017  Sat      0      45
11  11/12/2017  Sun      5       5