我有以下DataFrame,我想选择服务,其中该服务的实例少于2个“健康”实例。在这种情况下,我想要系列(EmailService,UserService,NotificationService)
CPU Service Memory Status
IP
10.22.11.150 13 StorageService 55 Healthy
10.22.11.90 23 StorageService 19 Healthy
10.22.11.91 10 EmailService 44 Healthy
10.22.11.92 69 UserService 1 Healthy
10.22.11.93 63 NotificationService 81 Healthy
10.22.11.93 87 NotificationService 98 Unhealthy
我想我需要这个分组,
grouped = servers_df.groupby('Service')
但不确定如何计算“状态”列,然后根据该列获取结果。
答案 0 :(得分:3)
将transform
与lambda函数一起用于计数Healthy
并进行比较,最后按boolean indexing
过滤:
df = df[df.groupby('Service')['Status'].transform(lambda x: (x=='Healthy').sum() < 2)]
print (df)
CPU Service Memory Status
IP
10.22.11.91 10 EmailService 44 Healthy
10.22.11.92 69 UserService 1 Healthy
10.22.11.93 63 NotificationService 81 Healthy
10.22.11.93 87 NotificationService 98 Unhealthy
如果要为每个组仅检查1个值Healthy
,请对所有欺骗使用duplicated
keep=False
,并将其与条件进行链接以进行比较Healthy
以筛选出多个{{1然后按Unhealthy
反转条件并再次过滤~
:
boolean indexing
答案 1 :(得分:1)
您也可以使用filter
。
df.groupby("Service").filter(lambda x: len(x[x.Status == "Healthy"]) < 2)
根据jezrael's experiment in this answer
,速度可能会慢一些另一种方式:使用apply
(从jezrael&#39;转换解决方案修改)
df.groupby('Service').apply(
lambda x: x if (x.Status == 'Healthy').sum() < 2 else None)
IP CPU Service Memory Status
Service
EmailService 2 10.22.11.91 10 EmailService 44 Healthy
NotificationService 4 10.22.11.93 63 NotificationService 81 Healthy
5 10.22.11.93 87 NotificationService 98 Unhealthy
UserService 3 10.22.11.92 69 UserService 1 Healthy
答案 2 :(得分:1)
IIUC
s=df[df.Status=='Healthy'].groupby('Service').Service.count().lt(2)
df.loc[df.Service.isin(s[s].index)]
IP CPU Service Memory Status
2 10.22.11.91 10 EmailService 44 Healthy
3 10.22.11.92 69 UserService 1 Healthy
4 10.22.11.93 63 NotificationService 81 Healthy
5 10.22.11.93 87 NotificationService 98 Unhealthy