Pandas groupby分裂错误

时间:2018-06-02 00:28:51

标签: python pandas pandas-groupby

鉴于下面的示例数据,我试图计算条件概率。列表示在序列(S1,S2 ......)中发生的事件A-E。

前两个示例按预期工作并计算P(S2 | S1)和P(S2,S3 | S1)。当条件包括多个列时,该方法会中断,如示例三所示,我预期会计算P(S3 | S1,S2)。

我很欣赏有关为什么这不起作用以及哪种替代方法可以获得P(S3 | S1,S2)的期望结果的见解。例如,我希望输出包含行A,D,B,0.25A,D,C,0.75

谢谢!

MWE代码:

import pandas as pd

data = { 'S1' : ['A','A','A','B','B','A','A'],
         'S2' : ['B','D','D','A','D','D','D'],
         'S3' : ['C','C','C','D','C','B','C'],
         'S4' : ['D','B','E','C','A','C','E'] }

df = pd.DataFrame(data)

print (df)
print ((df.groupby(['S1','S2']).agg({'S4':'count'}) /
        df.groupby('S1').agg({'S4':'count'})).rename(columns={'S4':'Freq'}))
print ((df.groupby(['S1','S2','S3']).agg({'S4':'count'}) / 
        df.groupby('S1').agg({'S4':'count'})).rename(columns={'S4':'Freq'}))
print ((df.groupby(['S1','S2','S3']).agg({'S4':'count'}) / 
       df.groupby(['S1','S2']).agg({'S4':'count'})).rename(columns={'S4':'Freq'}))

输出:

  S1 S2 S3 S4
0  A  B  C  D
1  A  D  C  B
2  A  D  C  E
3  B  A  D  C
4  B  D  C  A
5  A  D  B  C
6  A  D  C  E
       Freq
S1 S2      
A  B    0.2
   D    0.8
B  A    0.5
   D    0.5
          Freq
S1 S2 S3      
A  B  C    0.2
   D  B    0.2
      C    0.6
B  A  D    0.5
   D  C    0.5
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    print ((df.groupby(['S1','S2','S3']).agg({'S4':'count'}) / df.groupby(['S1','S2']).agg({'S4':'count'})).rename(columns={'S4':'Freq'}))
NotImplementedError: merging with more than one level overlap on a multi-index is not implemented

0 个答案:

没有答案