Question

我正在尝试对last_n行的列执行滚动总和，并将其强制转换为数据帧中的新列，并按不同的列进行分组。因此，这是我拥有的数据框类型的示例：


id.   a.   b.    c.    date.    
01    0    abc   def   1/22/20  
01    2    abc   def   1/23/20  
01    1    abc   def   1/24/20  
01    1    abc   def   1/25/20  
02    4    abc   def   1/22/20  
02    5    abc   def   1/23/20  
02    5    abc   def   1/24/20  
02    0    abc   def   1/25/20  
03    1    abc   def   1/22/20  
03    0    abc   def   1/23/20  
03    2    abc   def   1/24/20  
03    2    abc   def   1/25/20  
.
.
.

这些是任意值，但假设我想对每个column=a.在id的过去2（示例）天进行滚动汇总。输出应如下所示：

如果过去n天都不存在，只需将0添加到累计金额中即可。


id.   a.   b.    c.    date.    rolling_2_a
01    0    abc   def   1/22/20  0
01    2    abc   def   1/23/20  2
01    1    abc   def   1/24/20  3
01    1    abc   def   1/25/20  2
02    4    abc   def   1/22/20  4
02    5    abc   def   1/23/20  9
02    5    abc   def   1/24/20  10
02    0    abc   def   1/25/20  5
03    1    abc   def   1/22/20  1
03    0    abc   def   1/23/20  1
03    2    abc   def   1/24/20  2
03    2    abc   def   1/25/20  4

.
.
.

我知道如何基于id求和，但是这里使用date元素+ last_n的要求，我不确定pandas是否具有该功能。

为此，我们假设date列也可能未排序，但是对于这两个示例都将不胜感激。

Answer 1

ICCU

#Coerce date to datetime
  df['date.']=pd.to_datetime(df['date.'])

#Set date as index
  df.set_index('date.', inplace=True)

#Group by id 

 df['rolling_2_a']=df.groupby(df['id.'])['a.'].transform(lambda x: x.rolling('2D').sum()).fillna(0)

最近n_天与groupby一起在特定列上的累积总和

1 个答案: