Question

假设我有一个像这样的数据框：

0                           Physician (Family Practice)   99
1                 Transportation Security Officer (TSO)   94
2                                    Physical Therapist   94
3                              Physician (Psychiatrist)   81

我想对数据帧进行计数/分组，以便将所有带有“医生”字样的行（部分匹配）汇总在一起，因此我得到以下内容：

0                                             Physician   180
1                 Transportation Security Officer (TSO)   94
2                                    Physical Therapist   94

Answer 1

这是一种方法（假设列名为“Job”和“Num”）：

>>> d.groupby(d.Job.map(lambda x: 'Physician' if 'Physician' in x else x)).sum()
                                       Num
Job                                       
Physical Therapist                      94
Physician                              180
Transportation Security Officer (TSO)   94

如果字符串包含“Physician”，则想法是将设置为“Physician”的标记分组，否则设置为原始值。您可以将此扩展为更多部分匹配。但是，如果您想要以这种方式折叠许多值，则添加包含广泛类别的其他列（例如“Physician”）然后对其进行分组可能更具可读性。

如何计算Dataframe字段中字符串的出现次数？

1 个答案: