如果pandas行发现多于一列且最新丢弃,则条件为True

时间:2019-06-04 11:40:32

标签: python pandas dataframe amazon-ec2

我编写了一个熊猫数据框,以查找其中“ instanceList”已存储所有实例详细信息的实例。

instanceList = [
    [
        "web-mgmt",
        "i-0268214908adb3949",
        "running",
        "2019-05-06 13:30:11+00:00"
    ],
    [
        "app-srv-1",
        "i-088d90fe72g67fb4c",
        "running",
        "2019-06-04 03:46:03+00:00"
    ],
    [
        "web-mgmt",
        "i-0cwewrgbr45fc8823",
        "running",
        "2019-05-22 14:45:32+00:00"
    ]
]
df = pd.DataFrame(instanceList, columns=['InstanceName', 'InstanceId', 'InstanceState', 'LaunchTime'])
df['Dates'] = pd.to_datetime(df['LaunchTime']).dt.date
df['Time'] = pd.to_datetime(df['LaunchTime']).dt.time
del df['LaunchTime']

此过滤器的输出为:

   InstanceName           InstanceId InstanceState       Dates      Time
2      web-mgmt  i-0268214908adb3949       running  2019-04-19  14:25:11
3      app-srv-1 i-088d90fe72g67fb4c       running  2019-06-04  03:46:03
5      web-mgmt  i-0cwewrgbr45fc8823       running  2019-05-06  10:30:10

现在我希望满足以下条件:

a。根据名称标签查找重复项。如果没有重复,请打印消息。

b。如果发现重复,请通过查看日期删除最新实例,这样我就可以在列表中找到所有较旧的实例。

到目前为止,我能够在下面找到重复的实例

# Find duplicate instance based on tag name
duplicateRows = df[df.duplicated(['InstanceName'], keep=False)]
print(duplicateRows, sep='\n')

在表下方输出。

   InstanceName           InstanceId InstanceState       Dates      Time
2      web-mgmt  i-0268214908adb3949       running  2019-04-19  14:25:11
5      web-mgmt  i-0cwewrgbr45fc8823       running  2019-05-06  10:30:10

有什么办法可以使条件这样的语句?无法弄清楚,请帮助我。

if df<SOMETHING> >= 1
  duplicateRows = df[df.duplicated(['InstanceName'], keep=False)]
  latest = duplicateRows.max()
  older = duplicateRows.drop(latest) <<-- error: datetime.time(14, 25, 11)] not found in axis
  print(older)
else:
  print message

1 个答案:

答案 0 :(得分:1)

然后将实例名称转换为唯一的唯一列表:

l = list(set(df['InstanceName'].tolist()))

使用列表过滤df,删除所需的内容:

x = []
for i in l:
    df_i = df.loc[df['InstanceName']==i]
    if len(df_i) > 1:
       df_i.set_index('Dates',drop=True,inplace=True)
       df_i = df_i.tail(len(df_i) - 1)
    df_i.reset_index(inplace=True)
    x.append(df_i)

df_final = pd.concat(x,ignore_index=True)

for i,row in df_final.iterrows():
    print(row)