假设我有一个像这样的数据框:
0 Physician (Family Practice) 99
1 Transportation Security Officer (TSO) 94
2 Physical Therapist 94
3 Physician (Psychiatrist) 81
我想对数据帧进行计数/分组,以便将所有带有“医生”字样的行(部分匹配)汇总在一起,因此我得到以下内容:
0 Physician 180
1 Transportation Security Officer (TSO) 94
2 Physical Therapist 94
答案 0 :(得分:1)
这是一种方法(假设列名为“Job”和“Num”):
>>> d.groupby(d.Job.map(lambda x: 'Physician' if 'Physician' in x else x)).sum()
Num
Job
Physical Therapist 94
Physician 180
Transportation Security Officer (TSO) 94
如果字符串包含“Physician”,则想法是将设置为“Physician”的标记分组,否则设置为原始值。您可以将此扩展为更多部分匹配。但是,如果您想要以这种方式折叠许多值,则添加包含广泛类别的其他列(例如“Physician”)然后对其进行分组可能更具可读性。