Question

每个资产ID都有两种类型的问题（“ RSL临界偏差”或“ RSL和TX功率偏差”），我们需要计算在同一资产ID中每个问题的重复次数

Asset    ID    Categorization Tier 3
4053     0001  RSL Critical Deviation
4054     0001  RSL and TX Power Deviation
3342     0005  RSL and TX Power Deviation
3343     0005  RSL and TX Power Deviation
3344     0005  RSL and TX Power Deviation
3345     0005  RSL and TX Power Deviation
3346     0005  RSL and TX Power Deviation
4363     0040  RSL and TX Power Deviation
4055     0046  RSL Critical Deviation
4056     0046  RSL Critical Deviation

结果应为

Asset ID  Categorization Tier 3     Count 
0001      RSL Critical Deviation        1
          RSL and TX Power Deviation    1
0005      RSL Critical Deviation        0
          RSL and TX Power Deviation    5

Answer 1

df.groupby(['ID', 'Categorization']).size()

SQL中的groupby语句可以应用于多个列。在熊猫中也是如此。熊猫实现SQL COUNT聚合的方法是使用size或count。他们的差异在this SO问题上得到了回答。

如何计算每组某个班级的重复次数

1 个答案: