熊猫-添加聚合功能

时间:2020-10-06 17:57:40

标签: python pandas

我在熊猫中有这个数据框:

   day customer  amount
0    1    cust1     500
1    2    cust2     100
2    1    cust1      50
3    2    cust1     100
4    2    cust2     250
5    6    cust1      20

我想创建一个新的列“ amount2days”,以便汇总最近两天每个客户的金额,以获得以下数据框:

   day customer  amount    amount2days   ----------------------------
0    1    cust1     500    500           (no past transactions)
1    2    cust2     100    100           (no past transactions)
2    1    cust1      50    550           (500 + 50 = rows 0,2 
3    2    cust1     100    650           (500 + 50 + 100, rows 0,2,3)
4    2    cust2     250    350           (100 + 250, rows 1,4) 
5    6    cust1      20    20            (notice day is 6, and no day=5 for cust1)

即我想执行以下(伪)代码:

df['amount2days'] = df_of_past_2_days['amount'].sum()
每行

。这样做最方便的方法是什么?

我希望进行的求和是一天中的总和,但是不一定要在每个新行中都增加天,如示例所示。我还是想对过去两天的金额进行汇总。

1 个答案:

答案 0 :(得分:1)

我认为这只是几天而已:

def get_roll(x):
    s = pd.Series(x['amount'].values, 
                  index=pd.to_datetime('1900-01-01') + pd.to_timedelta(x['day'], unit='D')
                 )
    return pd.Series(s.rolling('2D').sum().values, index=x.index)

df['amount2days'] = (df.groupby('customer').apply(get_roll)
                       .reset_index(level=0, drop=True)
                    )

输出:

   day customer  amount  amount2days
1    1    cust1     500        500.0
2    1    cust2     100        100.0
3    1    cust1      50        550.0
4    2    cust1     100        650.0
5    2    cust2     250        350.0
6    3    cust1      20        120.0

选项2 :由于您希望仅在两天内进行累计,因此今天的金额将仅与前一天的金额相加。这样我们就可以利用shift

df['amount2days'] = df.groupby(['customer','day'])['amount'].cumsum()

# shift the last item of the previous day and add
df['amount2days'] += (df.drop_duplicates(['day','customer'],keep='last')
   .groupby(['customer'])['amount2days'].shift()
   .reindex(df.index)
   .ffill()
   .fillna(0)
)