熊猫数-顾客购买前去过商店的次数。 (仅包括购买日期前30天的访问)

时间:2020-06-12 04:41:05

标签: python pandas

我有两个大小不同的数据帧(约10万条记录)。 Df1包含客户ID和购买日期。 Df2包含客户ID和访问日期。

我想通过在购买前计算顾客访问商店的次数(使用df2中的“访问日期”)来在df1中创建新列。 条件是访问日期应少于购买日期的30天。

下面是示例数据

df1:

df1 = pd.DataFrame({'Cust ID': [1,2,2,2,3,3], 'Transaction ID':[1001,1002,1003,1004,1005,1006], 'Purchase Date':["1/20/2017", "1/20/2018", "1/20/2017", "1/5/2017","1/20/2017","1/20/2017"]})`

Cust ID Transaction ID  Purchase Date
0   1   1001    1/20/2017
1   2   1002    1/20/2018
2   2   1003    1/20/2017
3   2   1004    1/5/2017
4   3   1005    1/20/2017
5   3   1006    1/20/2017

df2:

df2 = pd.DataFrame({'Cust ID': [1,1,1,1,1,2,2,2],  'Visit Date':["1/2/2017", "1/3/2017", "1/4/2017", "12/5/2017", "1/23/2017", "1/2/2017","1/3/2017","1/24/2017"]})

    Cust ID Store-ID    Visit Date
0   1   A1  1/2/2017
1   1   A1  1/3/2017
2   1   A1  1/4/2017
3   1   A1  12/5/2017
4   1   A1  1/23/2017
5   2   A1  1/2/2017
6   2   A1  1/3/2017
7   2   A1  1/24/2017

预期输出:

Cust ID Transaction ID  Purchase Date   Count of (Past 1-month visit)
0   1   1001    1/20/2017   3
1   2   1002    1/20/2017   2
2   2   1003    1/20/2018   0
3   2   1004    1/5/2017    2
4   3   1005    1/20/2017   0
5   3   1006    1/20/2017   0

我对python和pandas相当陌生。非常感谢您的帮助。

问候 卡尔提克。

1 个答案:

答案 0 :(得分:0)

购买日期是从访问日期算起的,并将访问之前30天之前的有条件摘录与原始的“ df1”组合在一起。

df1['Purchase Date'] = pd.to_datetime(df1['Purchase Date'], format='%m/%d/%Y')
df2['Visit Date'] = pd.to_datetime(df2['Visit Date'], format='%m/%d/%Y')
df3 = df2.merge(df1, on='Cust ID')
df3['Past_1M'] = df3['Purchase Date'] - df3['Visit Date']
import datetime
df3 = df3[(df3['Past_1M'] <= datetime.timedelta(30)) & (df3['Past_1M'] >= datetime.timedelta(0))]
df3 = df3.groupby(['Cust ID', 'Transaction ID']).agg('count').reset_index()
df3 = df1.merge(df3, on=['Cust ID', 'Transaction ID'], how='outer').fillna(0)
df3 = df3.iloc[:,[0,1,2,5]]

df3
    Cust ID Transaction ID  Purchase Date_x Past_1M
0   1   1001    2017-01-20  3.0
1   2   1002    2018-01-20  0.0
2   2   1003    2017-01-20  2.0
3   2   1004    2017-01-05  2.0
4   3   1005    2017-01-20  0.0
5   3   1006    2017-01-20  0.0