Question

如何找到下方pandas df的发生概率？我试图找到啤酒与其他商店相关联的可能性？我目前的活动时间是一天。

我有一个如下数据框：

eventtime                   name         src_store    
January 14, 2018 4:57:35    budlight     NaN
January 14, 2018 4:51:31    coors        5-119
January 14, 2018 4:31:32    pabst        NaN
January 14, 2018 4:57:31    budlight     5-118
January 14, 2018 4:58:21    coors        5-119
January 14, 2018 4:57:37    NaN          5-120
January 14, 2018 4:18:31    budlight     5-118
January 14, 2018 4:57:31    coors        5-119
January 14, 2018 4:57:52    NaN          5-120

一些代码给我一个比较矩阵：

pd.crosstab(df.name, df.src_store)

    src_store  5-118  5-119  5-120  NONE
name                                
NONE           0      0      2     0
budlight       2      0      0     1
coors          0      3      0     0
pabst          0      0      0     1

试图从中获取pvalues：

Name with    src_store
Name without src_store
src_store with    name
src_store without name

总体目标是找出啤酒与特定src_store相关的概率。

预期输出（NOT实际p_values）：

eventtime                   name         src_store    p_value
January 14, 2018 4:57:35    budlight     NaN          0.01
January 14, 2018 4:51:31    coors        5-119        0.02
January 14, 2018 4:31:32    pabst        NaN          0
January 14, 2018 4:57:31    budlight     5-118        0.002
January 14, 2018 4:58:21    coors        5-119        0.004
January 14, 2018 4:57:37    NaN          5-120        0.005
January 14, 2018 4:18:31    budlight     5-118        0.006
January 14, 2018 4:57:31    coors        5-119        0.007
January 14, 2018 4:57:52    NaN          5-120        0.008

发现发生概率pandas scipy

0 个答案: