Python Pandas计算漏斗

时间:2017-10-17 21:34:33

标签: python pandas group-by pivot

使用python pandas dataframe df:

Customer_ID | Transaction_ID  | Item_ID | date        | trans_nr
ABC           2017-04-12-333    X8973     2017-04-12     1
ABC           2017-04-12-333    X2468     2017-04-12     1
ABC           2017-05-22-658    X2906     2017-05-22     2
ABC           2017-05-22-757    X8790     2017-05-22     2
ABC           2017-07-13-864    X8790     2017-07-13     3
BCD           2017-08-11-879    X2346     2017-08-11     1
BCD           2017-08-11-879    X2468     2017-08-11     1
CDE           2017-03-05-879    X8973     2017-03-05     1
CDE           2017-05-22-879    X2468     2017-05-22     2
CDE           2017-10-15-879    X2346     2017-10-15     3

我需要基本上转动它,创建项目漏斗 - 换句话说,了解每次访问中有多少人购买了每件商品以及他们下次购买了哪个商品。

这是我要找的输出:

item_1trans|count_customer|item_2trans|count_customer|item_3trans|count_customer
X8973          2            X2906            1          X8790           1
                            X8790            1          X8790           1
                            X2468            1          X2346           1
X2468          2            X2906            1          X8790           1
                            X8790            1          X8790           1
X2346          1    

我设法通过一些丑陋的代码实现这一目标:

df_first = df.loc[df['trans_nr'] == 1].copy()
df_first = df_first.rename(columns={'Item_ID': 'item_1trans'})

df_second = df.loc[df['trans_nr'] == 1].copy()
df_second = df_second.rename(columns={'Item_ID': 'item_2trans'})

df_third = df.loc[df['trans_nr'] == 1].copy()
df_third = df_third.rename(columns={'Item_ID': 'item_3trans'})

df_step1 = pd.merge(df_first,df_second,how='outer',on=['Customer_ID'])
df_final = pd.merge(df_step1,df_third,how='outer',on=['Customer_ID'])

pd.pivot_table(df_final,index= \
['item_1trans','item_2trans','item_3trans'],values=["Customer_ID"],\
aggfunc=lambda x: len(x.unique())).to_csv('test.csv')

但必须有一种更顺畅的方式。

0 个答案:

没有答案