我有两个大小不同的数据帧(约10万条记录)。 Df1包含客户ID和购买日期。 Df2包含客户ID和访问日期。
我想通过在购买前计算顾客访问商店的次数(使用df2中的“访问日期”)来在df1中创建新列。 条件是访问日期应少于购买日期的30天。
下面是示例数据
df1:
df1 = pd.DataFrame({'Cust ID': [1,2,2,2,3,3], 'Transaction ID':[1001,1002,1003,1004,1005,1006], 'Purchase Date':["1/20/2017", "1/20/2018", "1/20/2017", "1/5/2017","1/20/2017","1/20/2017"]})`
Cust ID Transaction ID Purchase Date
0 1 1001 1/20/2017
1 2 1002 1/20/2018
2 2 1003 1/20/2017
3 2 1004 1/5/2017
4 3 1005 1/20/2017
5 3 1006 1/20/2017
df2:
df2 = pd.DataFrame({'Cust ID': [1,1,1,1,1,2,2,2], 'Visit Date':["1/2/2017", "1/3/2017", "1/4/2017", "12/5/2017", "1/23/2017", "1/2/2017","1/3/2017","1/24/2017"]})
Cust ID Store-ID Visit Date
0 1 A1 1/2/2017
1 1 A1 1/3/2017
2 1 A1 1/4/2017
3 1 A1 12/5/2017
4 1 A1 1/23/2017
5 2 A1 1/2/2017
6 2 A1 1/3/2017
7 2 A1 1/24/2017
预期输出:
Cust ID Transaction ID Purchase Date Count of (Past 1-month visit)
0 1 1001 1/20/2017 3
1 2 1002 1/20/2017 2
2 2 1003 1/20/2018 0
3 2 1004 1/5/2017 2
4 3 1005 1/20/2017 0
5 3 1006 1/20/2017 0
我对python和pandas相当陌生。非常感谢您的帮助。
问候 卡尔提克。
答案 0 :(得分:0)
购买日期是从访问日期算起的,并将访问之前30天之前的有条件摘录与原始的“ df1”组合在一起。
df1['Purchase Date'] = pd.to_datetime(df1['Purchase Date'], format='%m/%d/%Y')
df2['Visit Date'] = pd.to_datetime(df2['Visit Date'], format='%m/%d/%Y')
df3 = df2.merge(df1, on='Cust ID')
df3['Past_1M'] = df3['Purchase Date'] - df3['Visit Date']
import datetime
df3 = df3[(df3['Past_1M'] <= datetime.timedelta(30)) & (df3['Past_1M'] >= datetime.timedelta(0))]
df3 = df3.groupby(['Cust ID', 'Transaction ID']).agg('count').reset_index()
df3 = df1.merge(df3, on=['Cust ID', 'Transaction ID'], how='outer').fillna(0)
df3 = df3.iloc[:,[0,1,2,5]]
df3
Cust ID Transaction ID Purchase Date_x Past_1M
0 1 1001 2017-01-20 3.0
1 2 1002 2018-01-20 0.0
2 2 1003 2017-01-20 2.0
3 2 1004 2017-01-05 2.0
4 3 1005 2017-01-20 0.0
5 3 1006 2017-01-20 0.0