如何找到下方pandas df的发生概率?我试图找到啤酒与其他商店相关联的可能性?我目前的活动时间是一天。
我有一个如下数据框:
eventtime name src_store
January 14, 2018 4:57:35 budlight NaN
January 14, 2018 4:51:31 coors 5-119
January 14, 2018 4:31:32 pabst NaN
January 14, 2018 4:57:31 budlight 5-118
January 14, 2018 4:58:21 coors 5-119
January 14, 2018 4:57:37 NaN 5-120
January 14, 2018 4:18:31 budlight 5-118
January 14, 2018 4:57:31 coors 5-119
January 14, 2018 4:57:52 NaN 5-120
一些代码给我一个比较矩阵:
pd.crosstab(df.name, df.src_store)
src_store 5-118 5-119 5-120 NONE
name
NONE 0 0 2 0
budlight 2 0 0 1
coors 0 3 0 0
pabst 0 0 0 1
试图从中获取pvalues:
Name with src_store
Name without src_store
src_store with name
src_store without name
总体目标是找出啤酒与特定src_store相关的概率。
预期输出(NOT实际p_values):
eventtime name src_store p_value
January 14, 2018 4:57:35 budlight NaN 0.01
January 14, 2018 4:51:31 coors 5-119 0.02
January 14, 2018 4:31:32 pabst NaN 0
January 14, 2018 4:57:31 budlight 5-118 0.002
January 14, 2018 4:58:21 coors 5-119 0.004
January 14, 2018 4:57:37 NaN 5-120 0.005
January 14, 2018 4:18:31 budlight 5-118 0.006
January 14, 2018 4:57:31 coors 5-119 0.007
January 14, 2018 4:57:52 NaN 5-120 0.008