使用python pandas dataframe df:
Customer_ID | Transaction_ID | Item_ID | date | trans_nr
ABC 2017-04-12-333 X8973 2017-04-12 1
ABC 2017-04-12-333 X2468 2017-04-12 1
ABC 2017-05-22-658 X2906 2017-05-22 2
ABC 2017-05-22-757 X8790 2017-05-22 2
ABC 2017-07-13-864 X8790 2017-07-13 3
BCD 2017-08-11-879 X2346 2017-08-11 1
BCD 2017-08-11-879 X2468 2017-08-11 1
CDE 2017-03-05-879 X8973 2017-03-05 1
CDE 2017-05-22-879 X2468 2017-05-22 2
CDE 2017-10-15-879 X2346 2017-10-15 3
我需要基本上转动它,创建项目漏斗 - 换句话说,了解每次访问中有多少人购买了每件商品以及他们下次购买了哪个商品。
这是我要找的输出:
item_1trans|count_customer|item_2trans|count_customer|item_3trans|count_customer
X8973 2 X2906 1 X8790 1
X8790 1 X8790 1
X2468 1 X2346 1
X2468 2 X2906 1 X8790 1
X8790 1 X8790 1
X2346 1
我设法通过一些丑陋的代码实现这一目标:
df_first = df.loc[df['trans_nr'] == 1].copy()
df_first = df_first.rename(columns={'Item_ID': 'item_1trans'})
df_second = df.loc[df['trans_nr'] == 1].copy()
df_second = df_second.rename(columns={'Item_ID': 'item_2trans'})
df_third = df.loc[df['trans_nr'] == 1].copy()
df_third = df_third.rename(columns={'Item_ID': 'item_3trans'})
df_step1 = pd.merge(df_first,df_second,how='outer',on=['Customer_ID'])
df_final = pd.merge(df_step1,df_third,how='outer',on=['Customer_ID'])
pd.pivot_table(df_final,index= \
['item_1trans','item_2trans','item_3trans'],values=["Customer_ID"],\
aggfunc=lambda x: len(x.unique())).to_csv('test.csv')
但必须有一种更顺畅的方式。