我在熊猫中有这个数据框:
day customer amount
0 1 cust1 500
1 2 cust2 100
2 1 cust1 50
3 2 cust1 100
4 2 cust2 250
5 6 cust1 20
我想创建一个新的列“ amount2days”,以便汇总最近两天每个客户的金额,以获得以下数据框:
day customer amount amount2days ----------------------------
0 1 cust1 500 500 (no past transactions)
1 2 cust2 100 100 (no past transactions)
2 1 cust1 50 550 (500 + 50 = rows 0,2
3 2 cust1 100 650 (500 + 50 + 100, rows 0,2,3)
4 2 cust2 250 350 (100 + 250, rows 1,4)
5 6 cust1 20 20 (notice day is 6, and no day=5 for cust1)
即我想执行以下(伪)代码:
df['amount2days'] = df_of_past_2_days['amount'].sum()
每行。这样做最方便的方法是什么?
我希望进行的求和是一天中的总和,但是不一定要在每个新行中都增加天,如示例所示。我还是想对过去两天的金额进行汇总。
答案 0 :(得分:1)
我认为这只是几天而已:
def get_roll(x):
s = pd.Series(x['amount'].values,
index=pd.to_datetime('1900-01-01') + pd.to_timedelta(x['day'], unit='D')
)
return pd.Series(s.rolling('2D').sum().values, index=x.index)
df['amount2days'] = (df.groupby('customer').apply(get_roll)
.reset_index(level=0, drop=True)
)
输出:
day customer amount amount2days
1 1 cust1 500 500.0
2 1 cust2 100 100.0
3 1 cust1 50 550.0
4 2 cust1 100 650.0
5 2 cust2 250 350.0
6 3 cust1 20 120.0
选项2 :由于您希望仅在两天内进行累计,因此今天的金额将仅与前一天的金额相加。这样我们就可以利用shift
:
df['amount2days'] = df.groupby(['customer','day'])['amount'].cumsum()
# shift the last item of the previous day and add
df['amount2days'] += (df.drop_duplicates(['day','customer'],keep='last')
.groupby(['customer'])['amount2days'].shift()
.reindex(df.index)
.ffill()
.fillna(0)
)