我编写了一个熊猫数据框,以查找其中“ instanceList”已存储所有实例详细信息的实例。
instanceList = [
[
"web-mgmt",
"i-0268214908adb3949",
"running",
"2019-05-06 13:30:11+00:00"
],
[
"app-srv-1",
"i-088d90fe72g67fb4c",
"running",
"2019-06-04 03:46:03+00:00"
],
[
"web-mgmt",
"i-0cwewrgbr45fc8823",
"running",
"2019-05-22 14:45:32+00:00"
]
]
df = pd.DataFrame(instanceList, columns=['InstanceName', 'InstanceId', 'InstanceState', 'LaunchTime'])
df['Dates'] = pd.to_datetime(df['LaunchTime']).dt.date
df['Time'] = pd.to_datetime(df['LaunchTime']).dt.time
del df['LaunchTime']
此过滤器的输出为:
InstanceName InstanceId InstanceState Dates Time
2 web-mgmt i-0268214908adb3949 running 2019-04-19 14:25:11
3 app-srv-1 i-088d90fe72g67fb4c running 2019-06-04 03:46:03
5 web-mgmt i-0cwewrgbr45fc8823 running 2019-05-06 10:30:10
现在我希望满足以下条件:
a。根据名称标签查找重复项。如果没有重复,请打印消息。
b。如果发现重复,请通过查看日期删除最新实例,这样我就可以在列表中找到所有较旧的实例。
到目前为止,我能够在下面找到重复的实例
# Find duplicate instance based on tag name
duplicateRows = df[df.duplicated(['InstanceName'], keep=False)]
print(duplicateRows, sep='\n')
在表下方输出。
InstanceName InstanceId InstanceState Dates Time
2 web-mgmt i-0268214908adb3949 running 2019-04-19 14:25:11
5 web-mgmt i-0cwewrgbr45fc8823 running 2019-05-06 10:30:10
有什么办法可以使条件这样的语句?无法弄清楚,请帮助我。
if df<SOMETHING> >= 1
duplicateRows = df[df.duplicated(['InstanceName'], keep=False)]
latest = duplicateRows.max()
older = duplicateRows.drop(latest) <<-- error: datetime.time(14, 25, 11)] not found in axis
print(older)
else:
print message
答案 0 :(得分:1)
然后将实例名称转换为唯一的唯一列表:
l = list(set(df['InstanceName'].tolist()))
使用列表过滤df,删除所需的内容:
x = []
for i in l:
df_i = df.loc[df['InstanceName']==i]
if len(df_i) > 1:
df_i.set_index('Dates',drop=True,inplace=True)
df_i = df_i.tail(len(df_i) - 1)
df_i.reset_index(inplace=True)
x.append(df_i)
df_final = pd.concat(x,ignore_index=True)
for i,row in df_final.iterrows():
print(row)