熊猫集团和条件比率

时间:2018-05-22 16:13:37

标签: python-3.x pandas

我想根据条件采用计数比率,并且我正努力使用pandas数据框来正确计算。

数据如下:

                   JOB_ROLE           COMMENTS ACTIVITY_TYPE  COUNTS  
             Director-Level  Meeting Requested     EmailSend     490    
              Manager-Level  Meeting Requested     EmailSend     305  
             Non-Managerial  Meeting Requested     EmailSend     272  
     Top Executive; C-Level  Meeting Requested     EmailSend     226  
                   VP-Level  Meeting Requested     EmailSend     185
             Director-Level  Meeting Requested    FormSubmit     131
              Manager-Level  Meeting Requested    FormSubmit      74
     Top Executive; C-Level  Meeting Requested    FormSubmit      61
                   VP-Level  Meeting Requested    FormSubmit      53
             Non-Managerial  Meeting Requested    FormSubmit      52
                      Other  Meeting Requested     EmailSend      20
                      Other  Meeting Requested    FormSubmit       2

我的尝试如下:

ratios =  mr_jr.groupby('JOB_ROLE').apply(lambda x: x[x['ACTIVITY_TYPE']=='FormSubmit'].COUNTS / x[x['ACTIVITY_TYPE']=='EmailSend'].COUNTS)

将条件应用于每个组并执行算术的正确方法是什么?

提前多多感谢。

EDITED

期望的输出:

print(list(ratios)) # prints: [0.26, 0.24, 0.19, 0.27, 0.28, 0.1]

1 个答案:

答案 0 :(得分:2)

看起来像数据透视表的作业。

piv = df.pivot('JOB_ROLE', 'ACTIVITY_TYPE').COUNTS

输出:

In [119]: piv.FormSubmit / piv.EmailSend
Out[119]: 
JOB_ROLE
Director-Level            0.267347
Manager-Level             0.242623
Non-Managerial            0.191176
Other                     0.100000
Top Executive; C-Level    0.269912
VP-Level                  0.286486
dtype: float64

没有支点:

df.set_index('JOB_ROLE', drop=True, inplace=True)
emails = df[df.ACTIVITY_TYPE == 'EmailSend']
forms  = df[df.ACTIVITY_TYPE == 'FormSubmit']
print(forms.COUNTS / emails.COUNTS)