如何计算Dataframe字段中字符串的出现次数?

时间:2014-09-29 19:14:42

标签: python pandas dataframe

假设我有一个像这样的数据框:

0                           Physician (Family Practice)   99
1                 Transportation Security Officer (TSO)   94
2                                    Physical Therapist   94
3                              Physician (Psychiatrist)   81

我想对数据帧进行计数/分组,以便将所有带有“医生”字样的行(部分匹配)汇总在一起,因此我得到以下内容:

0                                             Physician   180
1                 Transportation Security Officer (TSO)   94
2                                    Physical Therapist   94

1 个答案:

答案 0 :(得分:1)

这是一种方法(假设列名为“Job”和“Num”):

>>> d.groupby(d.Job.map(lambda x: 'Physician' if 'Physician' in x else x)).sum()
                                       Num
Job                                       
Physical Therapist                      94
Physician                              180
Transportation Security Officer (TSO)   94

如果字符串包含“Physician”,则想法是将设置为“Physician”的标记分组,否则设置为原始值。您可以将此扩展为更多部分匹配。但是,如果您想要以这种方式折叠许多值,则添加包含广泛类别的其他列(例如“Physician”)然后对其进行分组可能更具可读性。