如何在pandas中使用DataFrame实现概率边缘化函数?

时间:2015-05-07 09:52:17

标签: python pandas

我有一个这样的概率表:

        BC_array =[np.array(['B=n','B=m','B=s','B=n','B=m','B=s']),np.array(['C=F', 'C=F', 'C=F', 'C=T', 'C=T', 'C=T'])]
        pD_BC_array=np.array([[0.9,0.8,0.1,0.3,0.4,0.01],[0.08,0.17,0.01,0.05,0.05,0.01],[0.01,0.01,0.87,0.05,0.15,0.97],[0.01,0.02,0.02,0.6,0.4,0.01]])
        pD_BC=pd.DataFrame(pD_BC_array,index=['D=h','D=c','D=s','D=r'],columns=BC_array)
      B=n   B=m   B=s   B=n   B=m   B=s
      C=F   C=F   C=F   C=T   C=T   C=T
D=h  0.90  0.80  0.10  0.30  0.40  0.01
D=c  0.08  0.17  0.01  0.05  0.05  0.01
D=s  0.01  0.01  0.87  0.05  0.15  0.97
D=r  0.01  0.02  0.02  0.60  0.40  0.01

我怎样才能边缘化C'(总结所有' C = F'和' C = T'一起)并得到表格:

      B=n   B=m   B=s 
D=h  1.20  1.20  0.11  
D=c  0.13  0.22  0.02 
D=s  0.06  0.16  1.84 
D=r  0.61  0.42  0.03 
像这样?

1 个答案:

答案 0 :(得分:1)

您可以在df上调用sum并传递参数axis=1以获取行和level=0以及该级别的总和:

In [259]:

pD_BC.sum(axis=1, level=0)
Out[259]:
      B=m   B=n   B=s
D=h  1.20  1.20  0.11
D=c  0.22  0.13  0.02
D=s  0.16  0.06  1.84
D=r  0.42  0.61  0.03