我想计算每位客户在订阅之前最后一次交易的天数。 如果每个客户的订阅日期相同,那么我可以过滤掉订阅日期之后的trx日期,但是客户的订阅日期不同。
初始数据框
ad = {'customer':['Clark','Stones','Fay','Stones','Clark','Clark','Clark'],
'subscribe_date':['2020-11-30','2020-07-01','2021-01-02','2020-07-12','2020-11-30','2020-11-30','2020-11-30'],
'trx_date':['2020-12-30','2020-07-12','2020-07-14','2020-07-25','2021-02-01','2020-09-01','2020-11-27'],
'trx_amount':[100,90,50,45,20,30,50],
}
ad = pd.DataFrame(ad)
ad.sort_values(by=['customer','trx_date'])
预期数据帧
ad2 = {'customer':['Clark','Stones','Fay'],
'subscribe_date':['2020-11-30','2020-07-01','2021-01-02'],
'days_since_last_succ_tx_BEFORE_subs':['3','0','0']}
ad2 = pd.DataFrame(ad2)
ad2
说明: 克拉克已经进行了 4 次交易。他于2020年11月30日认购。他认购前的最后一天交易是2020年11月27日。因此,价值为3。
如果客户在订阅之前从未进行过交易,我会保留值 np.NaN
。
答案 0 :(得分:2)
使用:
#convert to datetimes
ad['trx_date'] = pd.to_datetime(ad['trx_date'])
ad['subscribe_date'] = pd.to_datetime(ad['subscribe_date'])
#get days difference
ad['days'] = ad['subscribe_date'].sub(ad['trx_date']).dt.days
#replace nagative to NaN
ad['days'] = ad['days'].mask(ad['days'].lt(0))
#get rows by minimal days per customer
cols = ['customer','subscribe_date','days']
df = ad.sort_values(['customer','days']).drop_duplicates('customer')[cols]
print (df)
customer subscribe_date days
6 Clark 2020-11-30 3.0
2 Fay 2021-01-02 172.0
1 Stones 2020-07-01 NaN
答案 1 :(得分:1)
# Convert columns to datetime
ad['subscribe_date'] = pd.to_datetime(ad['subscribe_date'])
ad['trx_date'] = pd.to_datetime(ad['trx_date'])
# Calculate timedelta
ad['time_delta'] = ad['subscribe_date'] - ad['trx_date']
# mark negative timedeltas as invalid
mask_after_subcribe = ad['subscribe_date'].lt(ad['trx_date'] )
ad.loc[mask_after_subcribe , 'time_delta'] = pd.NaT
# groubby customer and return minmal value of time_delta
time_delta_minimal = ad.groupby('customer')['time_delta'].agg(min)