我有一个数据集,其级别0索引为客户ID,级别1索引为voucher_date。我感兴趣的列是“流入”,我希望为每个客户提供过去180个凭证日期的滚动总和。但是,仅当'original_voucher_date'列小于或等于滚动总和中的最大'voucher_date'索引时,我才需要这样做。
下面的代码非常适合无条件累计。
df_unpaid_c1_hmm_date_lvl1 = df_unpaid_c1_hmm_date.copy()
df_unpaid_c1_hmm_date_lvl1.index = df_unpaid_c1_hmm_date_lvl1.index.droplevel(0)
df_unpaid_c1_hmm_date_lvl1.groupby('customer_id').inflow.rolling('180D').sum().reset_index(level=0, drop=True)
这是一个虚拟数据集:-
customer_id voucher_date inflow ori_voucher_date
1 2009-10-23 17000.0 2009-12-23
2010-02-26 10000.0 2010-10-26
2011-12-29 0.0 2011-02-29
2012-03-31 0.0 2012-05-31
2012-07-23 1000.0 2013-07-23
2012-09-24 500.0 2012-11-24
2012-10-19 15200.0 2012-10-19
2012-10-30 1000.0 2012-12-30
2012-12-25 0.0 2014-12-25
2 2013-01-15 0.0 2013-06-15
2 2013-02-22 20000.0 2013-05-22
预期的结果:-
customer_id voucher_date inflow ori_voucher_date rollin
1 2009-10-23 17000.0 2009-12-23 0.0
2010-02-26 10000.0 2010-10-26 17000
2011-12-29 0.0 2011-02-29 0.0
2012-03-31 0.0 2012-05-31 0.0
2012-07-23 1000.0 2013-07-23 0.0
2012-09-24 500.0 2012-11-24 0.0
2012-10-19 15200.0 2012-10-19 15200
2012-10-30 1000.0 2012-12-30 15200
2012-12-25 0.0 2014-12-25 15700
2 2013-01-15 0.0 2013-06-15 0.0
2 2013-02-22 20000.0 2013-05-22 0.0