计算条件概率

时间:2020-03-15 08:23:00

标签: python-3.x pandas probability

输入

cust_Id  category  product  purchased
1        Elec      light    0    
1        Elec      light    1
1        Elec      light    0
1        HA        Table    1
1        HH        Pen      1
2        Elec      light    0
2        HA        Table    1
3        HH        Pen      0
3        Elec      light    1

我想根据最大概率值了解最佳客户,类别,产品

2 个答案:

答案 0 :(得分:1)

尝试一下:

grp = df.groupby(['cust_Id', 'category', 'product'])
prob = grp.sum() / grp.count()

结果是3种属性的特定组合购买商品的可能性:

                          purchased
cust_Id category product           
1       Elec     light     0.333333
        HA       Table     1.000000
        HH       Pen       1.000000
2       Elec     light     0.000000
        HA       Table     1.000000
3       Elec     light     1.000000
        HH       Pen       0.000000

他们购买任何东西的可能性仅仅是其补充(即1 - prob

答案 1 :(得分:1)

如果要将sum除以countmean的定义,请使用:

out1 = df.groupby(['cust_Id', 'category', 'product'], as_index=False)['purchased'].mean()

out1 = (df.assign(zero = df['purchased'].eq(1))
          .groupby(['cust_Id', 'category', 'product'], as_index=False)['purchased'].mean())

如果要计算0个值:

out0 = (df.assign(zero = df['purchased'].eq(0))
          .groupby(['cust_Id', 'category', 'product'], as_index=False)['purchased'].mean())