identifier gender Date category
0 1 female 2016-11-11 Baby
1 1 female 2017-02-01 Baby
2 2 female 2016-12-19 Shave
3 2 female 2016-12-27 Shave
4 3 female 2016-11-11 Baby
5 3 female 2016-11-22 Baby
6 4 male 2016-11-11 Shave
7 4 male 2017-01-01 Shave
我需要结果作为订单数量的第一个和第二个订单的天数:
first order:
11.11.2016 3
19.12.2016 1
second orders:
22.11.2016 1
21.12.2016 1
01.01.2017 1
02.01.2017 1
third orders:
而且我还需要计算订单之间的平均时间(是人)
average time between orders = ...
评估客户的跨类别忠诚度。我觉得这些taska看起来很相似
Loyalty cross categories:
first order:
Baby 2
second order:
Baby - 2
third order:
first order:
Shave 2
second order:
Shave - 2
third order:
是否可以用熊猫做这样的分析?
答案 0 :(得分:1)
鉴于此数据框
identifier gender Date category
0 1 female 2016-11-11 Baby
1 1 female 2017-02-01 Baby
2 2 female 2016-12-19 Shave
3 2 female 2016-12-27 Shave
4 3 female 2016-11-11 Baby
5 3 female 2016-11-22 Baby
6 4 male 2016-11-11 Shave
7 4 male 2017-01-01 Shave
您可以先使用群组功能中的系列班次
df_groups = df.groupby('identifier')
df['last_order'] = df_groups.Date.shift(1)
然后你可以获得订单之间的时间
df['Time_between_orders'] = df['last_order'] - df['Date']
然后你可以得到每个用户的订单之间的平均时间,如下所示:
df_groups = df.groupby('identifier')
df_groups['Time_between_orders'].apply(lambda x: x.sum() / x.notnull().sum()).apply(lambda x: x.days)
会给:
identifier
1 -82
2 -8
3 -11
4 -51
如果您希望跨类别,只需将类别添加到所有组语句。 df.groupby('identifier')
变为df.groupby(['identifier', 'category'])