在CSV中搜索匹配字段并使用初始日期

时间:2016-08-15 08:20:40

标签: python csv duplicates

我正在尝试在CSV文件中搜索具有重复设备名称的行。输出应记录第一个匹配行的日期,并记录找到的最后一行的日期。我需要一些帮助来解决从CSV文件中删除重复设备名称的问题,同时还要记录设备第一次和最后一次出现的时间。

import time as epoch

# AlertTime, DeviceName, Status
Input = [['14/08/2016 13:00', 'device-A', 'UP'], ['14/08/2016 13:15', 'device-B', 'DOWN'], ['15/08/2016 17:30', 'device-A', 'UP']]

# FirstSeen, LastSeen, DeviceName, Status
Output = []

# Last 48 hours
now = epoch.time()
cutoff = now - (172800)

for i in Input:
    AlertTime = epoch.mktime(epoch.strptime(i[0], '%d/%m/%Y %H:%M'))
    if AlertTime > cutoff:
        Result = [i[0], i[0], i[1], i[2]]
        Output.append(Result)

print(Output)

输出(3个条目):

[['14/08/2016 13:00', '14/08/2016 13:00', 'device-A', 'UP'], ['14/08/2016 13:15', '14/08/2016 13:15', 'device-B', 'DOWN'], ['15/08/2016 17:30', '15/08/2016 17:30', 'device-A', 'UP']]

通缉输出(2个条目):

[['14/08/2016 13:15', '14/08/2016 13:15', 'device-B', 'DOWN'], ['14/08/2016 13:00', '15/08/2016 17:30', 'device-A', 'UP']]

2 个答案:

答案 0 :(得分:1)

正如Vedang Mehta在评论中所说,你可以用dict来存储数据。

    my_dict = {}
    for i in Input:
        AlertTime = epoch.mktime(epoch.strptime(i[0], '%d/%m/%Y %H:%M'))
        if AlertTime > cutoff:
            #if we have seen this device before, update it
            if i[1] in my_dict:
                my_dict[i[1]] = (my_dict[i[1]][0], i[0], i[2])
            #if we haven't seen it, add it
            else:
                my_dict[i[1]] = (i[0],i[0],i[2])

在此之后,您的所有设备都将存储在my_dict中,其中包含first_seenlast_seenstatus

答案 1 :(得分:1)

您可以使用OrderedDict来保留CSV文件中看到设备的顺序。字典用于自动删除重复项。

以下通过尝试更新现有字典条目(如果尚未存在),Python会生成KeyError异常。在这种情况下,可以添加具有相同开始和结束警报时间的新条目。更新条目时,现有first_seen用于使用最新找到的alert_timestatus更新条目。最后,解析字典以创建所需的输出格式:

from collections import OrderedDict

# AlertTime, DeviceName, Status
input_data = [['14/08/2016 13:00', 'device-A', 'UP'], ['14/08/2016 13:15', 'device-B', 'DOWN'], ['15/08/2016 17:30', 'device-A', 'UP']]

entries = OrderedDict()

for alert_time, device_name, status in input_data:
    try:
        entries[device_name] = [entries[device_name][0], alert_time, status]
    except KeyError as e:
        entries[device_name] = [alert_time, alert_time, status]

# Convert the dictionary of entries into the required format        
output_data = [[device_name, first_seen, last_seen, status] for device_name, [first_seen, last_seen, status] in entries.items()]

print(output_data)

输出为:

[['device-A', '14/08/2016 13:00', '15/08/2016 17:30', 'UP'], ['device-B', '14/08/2016 13:15', '14/08/2016 13:15', 'DOWN']]