我正在尝试在CSV文件中搜索具有重复设备名称的行。输出应记录第一个匹配行的日期,并记录找到的最后一行的日期。我需要一些帮助来解决从CSV文件中删除重复设备名称的问题,同时还要记录设备第一次和最后一次出现的时间。
import time as epoch
# AlertTime, DeviceName, Status
Input = [['14/08/2016 13:00', 'device-A', 'UP'], ['14/08/2016 13:15', 'device-B', 'DOWN'], ['15/08/2016 17:30', 'device-A', 'UP']]
# FirstSeen, LastSeen, DeviceName, Status
Output = []
# Last 48 hours
now = epoch.time()
cutoff = now - (172800)
for i in Input:
AlertTime = epoch.mktime(epoch.strptime(i[0], '%d/%m/%Y %H:%M'))
if AlertTime > cutoff:
Result = [i[0], i[0], i[1], i[2]]
Output.append(Result)
print(Output)
输出(3个条目):
[['14/08/2016 13:00', '14/08/2016 13:00', 'device-A', 'UP'], ['14/08/2016 13:15', '14/08/2016 13:15', 'device-B', 'DOWN'], ['15/08/2016 17:30', '15/08/2016 17:30', 'device-A', 'UP']]
通缉输出(2个条目):
[['14/08/2016 13:15', '14/08/2016 13:15', 'device-B', 'DOWN'], ['14/08/2016 13:00', '15/08/2016 17:30', 'device-A', 'UP']]
答案 0 :(得分:1)
正如Vedang Mehta在评论中所说,你可以用dict来存储数据。
my_dict = {}
for i in Input:
AlertTime = epoch.mktime(epoch.strptime(i[0], '%d/%m/%Y %H:%M'))
if AlertTime > cutoff:
#if we have seen this device before, update it
if i[1] in my_dict:
my_dict[i[1]] = (my_dict[i[1]][0], i[0], i[2])
#if we haven't seen it, add it
else:
my_dict[i[1]] = (i[0],i[0],i[2])
在此之后,您的所有设备都将存储在my_dict
中,其中包含first_seen
,last_seen
和status
。
答案 1 :(得分:1)
您可以使用OrderedDict
来保留CSV文件中看到设备的顺序。字典用于自动删除重复项。
以下通过尝试更新现有字典条目(如果尚未存在),Python会生成KeyError
异常。在这种情况下,可以添加具有相同开始和结束警报时间的新条目。更新条目时,现有first_seen
用于使用最新找到的alert_time
和status
更新条目。最后,解析字典以创建所需的输出格式:
from collections import OrderedDict
# AlertTime, DeviceName, Status
input_data = [['14/08/2016 13:00', 'device-A', 'UP'], ['14/08/2016 13:15', 'device-B', 'DOWN'], ['15/08/2016 17:30', 'device-A', 'UP']]
entries = OrderedDict()
for alert_time, device_name, status in input_data:
try:
entries[device_name] = [entries[device_name][0], alert_time, status]
except KeyError as e:
entries[device_name] = [alert_time, alert_time, status]
# Convert the dictionary of entries into the required format
output_data = [[device_name, first_seen, last_seen, status] for device_name, [first_seen, last_seen, status] in entries.items()]
print(output_data)
输出为:
[['device-A', '14/08/2016 13:00', '15/08/2016 17:30', 'UP'], ['device-B', '14/08/2016 13:15', '14/08/2016 13:15', 'DOWN']]