输入
cust_Id category product purchased
1 Elec light 0
1 Elec light 1
1 Elec light 0
1 HA Table 1
1 HH Pen 1
2 Elec light 0
2 HA Table 1
3 HH Pen 0
3 Elec light 1
我想根据最大概率值了解最佳客户,类别,产品
答案 0 :(得分:1)
尝试一下:
grp = df.groupby(['cust_Id', 'category', 'product'])
prob = grp.sum() / grp.count()
结果是3种属性的特定组合购买商品的可能性:
purchased
cust_Id category product
1 Elec light 0.333333
HA Table 1.000000
HH Pen 1.000000
2 Elec light 0.000000
HA Table 1.000000
3 Elec light 1.000000
HH Pen 0.000000
他们不购买任何东西的可能性仅仅是其补充(即1 - prob
)
答案 1 :(得分:1)
如果要将sum
除以count
是mean
的定义,请使用:
out1 = df.groupby(['cust_Id', 'category', 'product'], as_index=False)['purchased'].mean()
out1 = (df.assign(zero = df['purchased'].eq(1))
.groupby(['cust_Id', 'category', 'product'], as_index=False)['purchased'].mean())
如果要计算0
个值:
out0 = (df.assign(zero = df['purchased'].eq(0))
.groupby(['cust_Id', 'category', 'product'], as_index=False)['purchased'].mean())