从数据帧中,仅忽略“ decommissioned”中具有column:STATUS的主机。
对于状态同时为“已停用”和“活动”的主机,请选择LAST_MODIFIED比其他主机大的行。
Input.csv
LAST_MODIFIED, HOST, STATUS
7/01/2019 10:20:00, host1, decommissioned
6/01/2019 02:10:02, host1, active
6/01/2019 02:10:02, host1, active
5/01/2019 02:10:02, host1, decommissioned
6/20/2019 10:20:00, host2, active
6/10/2019 01:20:02, host3, decommissioned
6/01/2019 02:10:00, host3, decommissioned
output.csv
LAST_MODIFIED, HOST, STATUS
7/01/2019 10:20:00, host1, decommissioned
6/20/2019 10:20:00, host2, active
答案 0 :(得分:1)
使用transform
+ any
创建遮罩以查找至少具有一个活动元素的组。然后屏蔽sort
+ groupby
+ tail
,以获得最大的'LAST_MODIFIED'
行。
#df['LAST_MODIFIED'] = pd.to_datetime(df.LAST_MODIFIED)
m = df.STATUS.eq('active').groupby(df.HOST).transform('any')
res = df[m].sort_values('LAST_MODIFIED').groupby('HOST').tail(1)
res
: LAST_MODIFIED HOST STATUS
4 2019-06-20 10:20:00 host2 active
0 2019-07-01 10:20:00 host1 decommissioned