Question

我需要你的快速帮助。我想为在首次购买后 30 天内进行第二次购买的客户查找 customer_id 和首次购买日期的列表。

即curstomer_id 的 1,2,3 在 30 天内进行了第二次购买。

我需要 curstomer_id 的 1、2、3 及其各自的第一次购买日期。

我有超过 10 万个 customer_id。

如何在 Pandas 中实现这一点？

Answer 1

您可以使用 groupby

s = df.groupby('Customer_id')['purchase_date'].apply(lambda x : (x.iloc[1]-x.iloc[0]).days<30)
out = df.loc[df.Customer_id.isin(s.index[s])].drop_duplicates('Customer_id')

Answer 2

这里有一个方法：

df2 = (df.loc[df['purchase_date']
              .lt(df['Customer_id']
                  .map((df.sort_values('purchase_date').groupby('Customer_id').first() + pd.to_timedelta(30,'d'))
                       .squeeze()))])

df2 = (df2.loc[df2.duplicated('Customer_id',keep=False)]
       .groupby('Customer_id').first())

Answer 3

您可以设置布尔掩码来过滤在 30 天内进行第二次购买的客户组，如下所示：

# Pre-processing to sort the data and convert date to the required date format
df = df.sort_values(['Customer_id', 'purchase_date'])
df['purchase_date'] = pd.to_datetime(df['purchase_date'])

# Set boolean mask
mask = (((df['purchase_date'] - df['purchase_date'].groupby(df['Customer_id']).shift()).dt.days <= 30)
            .groupby(df['Customer_id'])
            .transform('any')
       )

然后，我们已经可以通过以下代码过滤30天内第二次购买的客户的交易记录：

df[mask]

要进一步显示 customer_id 及其各自的首次购买日期，您可以使用：

df[mask].groupby('Customer_id', as_index=False).first()

如何找到在 30 天内进行第二次购买的客户？

3 个答案: