我正在编写一个程序来获取.csv文件并为票证关闭数据创建“指标”。每张票都有一个或多个时间条目;目标是获取open
- >的'delta'(即 - 时差)。 close
和time_start
- >每票价为time_end
;这些不是真正的变量,它们只是出于这个问题的目的。
所以,假设我们的票12345有3个时间条目,如下:
ticket: 12345
open: 2016-09-26 00:00:00.000 close: 2016-09-27 00:01:00.000
time_start: 2016-09-26 00:01:00.000 time_end: 2016-09-26 00:02:00.000
ticket: 12345
open: 2016-09-26 00:00:00.000 close: 2016-09-27 00:01:00.000
time_start: 2016-09-26 00:01:00.000 time_end: 2016-09-26 00:02:00.000
ticket: 12345
open: 2016-09-26 00:00:00.000 close: 2016-09-27 00:01:00.000
time_start: 2016-09-26 00:01:00.000 time_end: 2016-09-27 00:02:00.000
我希望程序显示一个条目,加上'增量',如下:
ticket: 12345
Delta open/close ($total time from open to close):
Delta start/end: ($total time of ALL ticket time entries added up)
这是我到目前为止所拥有的;
.csv示例:
Ticket #,Ticket Type,Opened,Closed,Time Entry Day,Start,End
737385,Software,2016-09-06 12:48:31.680,2016-09-06 15:41:52.933,2016-09-06 00:00:00.000,1900-01-01 15:02:00.417,1900-01-01 15:41:00.417
737318,Hardware,2016-09-06 12:20:28.403,2016-09-06 14:35:58.223,2016-09-06 00:00:00.000,1900-01-01 14:04:00.883,1900-01-01 14:35:00.883
737296,Printing/Scan/Fax,2016-09-06 11:37:10.387,2016-09-06 13:33:07.577,2016-09-06 00:00:00.000,1900-01-01 13:29:00.240,1900-01-01 13:33:00.240
737273,Software,2016-09-06 10:54:40.177,2016-09-06 13:28:24.140,2016-09-06 00:00:00.000,1900-01-01 13:17:00.860,1900-01-01 13:28:00.860
737261,Software,2016-09-06 10:33:09.070,2016-09-06 13:19:41.573,2016-09-06 00:00:00.000,1900-01-01 13:05:00.113,1900-01-01 13:15:00.113
737238,Software,2016-09-06 09:52:57.090,2016-09-06 14:42:16.287,2016-09-06 00:00:00.000,1900-01-01 12:01:00.350,1900-01-01 12:04:00.350
737238,Software,2016-09-06 09:52:57.090,2016-09-06 14:42:16.287,2016-09-06 00:00:00.000,1900-01-01 14:36:00.913,1900-01-01 14:42:00.913
737220,Password,2016-09-06 09:28:16.060,2016-09-06 11:41:16.750,2016-09-06 00:00:00.000,1900-01-01 11:30:00.303,1900-01-01 11:36:00.303
737197,Hardware,2016-09-06 08:50:23.197,2016-09-06 14:02:18.817,2016-09-06 00:00:00.000,1900-01-01 13:48:00.530,1900-01-01 14:02:00.530
736964,Internal,2016-09-06 01:02:27.453,2016-09-06 05:46:00.160,2016-09-06 00:00:00.000,1900-01-01 06:38:00.917,1900-01-01 06:45:00.917
class Time_Entry.py:
#! /usr/bin/python
from datetime import *
class Time_Entry:
def __init__(self, ticket_no, time_entry_day, opened, closed, start, end):
self.ticket_no = ticket_no
self.time_entry_day = time_entry_day
self.opened = opened
self.closed = closed
self.start = datetime.strptime(start, '%Y-%m-%d %H:%M:%S.%f')
self.end = datetime.strptime(end, '%Y-%m-%d %H:%M:%S.%f')
self.total_open_close_delta = 0
self.total_start_end_delta = 0
def open_close_delta(self, topen, tclose):
open_time = datetime.strptime(topen, '%Y-%m-%d %H:%M:%S.%f')
if tclose != '\\N':
close_time = datetime.strptime(tclose, '%Y-%m-%d %H:%M:%S.%f')
self.total_open_close_delta = close_time - open_time
def start_end_delta(self, tstart, tend):
start_time = datetime.strptime(tstart, '%Y-%m-%d %H:%M:%S.%f')
end_time = datetime.strptime(tend, '%Y-%m-%d %H:%M:%S.%f')
start_end_delta = (end_time - start_time).seconds
self.total_start_end_delta += start_end_delta
return (self.total_start_end_delta)
def add_start_end_delta(self, delta):
self.total_start_end_delta += delta
def display(self):
print('Ticket #: %7.7s Start: %-15s End: %-15s Delta: %-10s' % (self.ticket_no, self.start.time(), self.end.time(), self.total_start_end_delta))
由metrics.py调用:
#! /usr/bin/python
import csv
import pprint
from Time_Entry import *
file = '/home/jmd9qs/userdrive/metrics.csv'
# setup CSV, load up a list of dicts
reader = csv.DictReader(open(file))
dict_list = []
for line in reader:
dict_list.append(line)
def load_tickets(ticket_list):
for i, key in enumerate(ticket_list):
ticket_no = key['Ticket #']
time_entry_day = key['Time Entry Day']
opened = key['Opened']
closed = key['Closed']
start = key['Start']
end = key['End']
time_entry = Time_Entry(ticket_no, time_entry_day, opened, closed, start, end)
time_entry.open_close_delta(opened, closed)
time_entry.start_end_delta(start, end)
for h, key2 in enumerate(ticket_list):
ticket_no2 = key2['Ticket #']
time_entry_day2 = key2['Time Entry Day']
opened2 = key2['Opened']
closed2 = key2['Closed']
start2 = key2['Start']
end2 = key2['End']
time_entry2 = Time_Entry(ticket_no2, time_entry_day2, opened2, closed2, start2, end2)
if time_entry.ticket_no == time_entry2.ticket_no and i != h:
# add delta and remove second time_entry from dict (no counting twice)
time_entry2_delta = time_entry2.start_end_delta(start2, end2)
time_entry.add_start_end_delta(time_entry2_delta)
del dict_list[h]
time_entry.display()
load_tickets(dict_list)
到目前为止,这似乎工作正常;但是,我为每张票提供了多行输出,而不是添加了“增量”的输出。 FYI程序显示输出的方式与我的例子有所不同,这是有意的。见下面的例子:
Ticket #: 738388 Start: 15:24:00.313000 End: 15:35:00.313000 Delta: 2400
Ticket #: 738388 Start: 16:30:00.593000 End: 16:40:00.593000 Delta: 1260
Ticket #: 738381 Start: 15:40:00.763000 End: 16:04:00.767000 Delta: 1440
Ticket #: 738357 Start: 13:50:00.717000 End: 14:10:00.717000 Delta: 1200
Ticket #: 738231 Start: 11:16:00.677000 End: 11:21:00.677000 Delta: 720
Ticket #: 738203 Start: 16:15:00.710000 End: 16:31:00.710000 Delta: 2160
Ticket #: 738203 Start: 09:57:00.060000 End: 10:02:00.060000 Delta: 1560
Ticket #: 738203 Start: 12:26:00.597000 End: 12:31:00.597000 Delta: 900
Ticket #: 738135 Start: 13:25:00.880000 End: 13:50:00.880000 Delta: 2040
Ticket #: 738124 Start: 07:56:00.117000 End: 08:31:00.117000 Delta: 2100
Ticket #: 738121 Start: 07:47:00.903000 End: 07:52:00.903000 Delta: 300
Ticket #: 738115 Start: 07:15:00.443000 End: 07:20:00.443000 Delta: 300
Ticket #: 737926 Start: 06:40:00.813000 End: 06:47:00.813000 Delta: 420
Ticket #: 737684 Start: 18:50:00.060000 End: 20:10:00.060000 Delta: 13380
Ticket #: 737684 Start: 13:00:00.560000 End: 13:08:00.560000 Delta: 8880
Ticket #: 737684 Start: 08:45:00 End: 10:00:00 Delta: 9480
请注意,有一些门票有多个条目,这是我不想要的。
关于风格,惯例等的任何注释也欢迎,因为我想要更多'Pythonic'
答案 0 :(得分:2)
这里的问题是,对于您实现的嵌套循环,您需要仔细检查同一个故障单。让我更好地解释一下:
ticket_list = [111111, 111111, 666666, 777777] # lets simplify considering the ids only
# I'm trying to keep the same variable names
for i, key1 in enumerate(ticket_list): # outer loop
cnt = 1
for h, key2 in enumerate(ticket_list): # inner loop
if key1 == key2 and i != h:
print('>> match on i:', i, '- h:', h)
cnt += 1
print('Found', key1, cnt, 'times')
了解它如何重复计算111111
>> match on i: 0 - h: 1
Found 111111 2 times
>> match on i: 1 - h: 0
Found 111111 2 times
Found 666666 1 times
Found 777777 1 times
这是因为当内循环检查第一个位置和第二个位置(111111
)时,你将匹配i: 0, h: 1
,当外部位于第二个位置时再次匹配i: 1, h: 0
而内在于第一个(import csv
import itertools
from datetime import *
class Time_Entry(object):
def __init__(self, entry):
self.ticket_no = entry['Ticket #']
self.time_entry_day = entry['Time Entry Day']
self.opened = datetime.strptime(entry['Opened'], '%Y-%m-%d %H:%M:%S.%f')
self.closed = datetime.strptime(entry['Closed'], '%Y-%m-%d %H:%M:%S.%f')
self.start = datetime.strptime(entry['Start'], '%Y-%m-%d %H:%M:%S.%f')
self.end = datetime.strptime(entry['End'], '%Y-%m-%d %H:%M:%S.%f')
self.total_open_close_delta = (self.closed - self.opened).seconds
self.total_start_end_delta = (self.end - self.start).seconds
def display(self):
print('Ticket #: %7.7s Start: %-15s End: %-15s Delta: %-10s' % (self.ticket_no, self.start.time(), self.end.time(), self.total_start_end_delta))
)。
更好的解决方案是将同一张票的条目分组,然后汇总您的增量。 groupby
非常适合您的任务。在这里,我冒昧地重写了一些代码:
这里我修改了构造函数以接受字典本身。它使得传递参数后来不那么混乱。我还删除了添加增量的方法,稍后我们会看到原因。
Time_Entry
我们在这里使用list comprehensions加载数据,最终输出将是with open('metrics.csv') as ticket_list:
time_entry_list = [Time_Entry(line) for line in csv.DictReader(ticket_list)]
print(time_entry_list)
# [<Time_Entry object at 0x101142f60>, <Time_Entry object at 0x10114d048>, <Time_Entry object at 0x1011fddd8>, ... ]
个对象的列表:
Time_Entry
在嵌套循环版本中,您继续在内部循环中重建Time_Entry
,这意味着对于100个条目,您最终会初始化10000个临时变量!创建列表&#34;外部&#34;相反,我们只允许我们初始化每个groupby
一次。
这就是魔术:我们可以使用ticket_no
来收集同一列表中具有相同sorted(time_entry_list, key=lambda x: x.ticket_no)
ticket_grps = itertools.groupby(time_entry_list, key=lambda x: x.ticket_no)
tickets = [(id, [t for t in tickets]) for id, tickets in ticket_grps]
的所有对象:
ticket
Time_Entry
中的最终结果是一个列表元组,其中第一个位置的票证ID,以及最后一个中关联的print(tickets)
# [('737385', [<Time_Entry object at 0x101142f60>]),
# ('737318', [<Time_Entry object at 0x10114d048>]),
# ('737238', [<Time_Entry object at 0x1011fdd68>, <Time_Entry object at 0x1011fde80>]),
# ...]
列表:
for ticket in tickets:
print('ticket:', ticket[0])
# extract list of deltas and then sum
print('Delta open / close:', sum([entry.total_open_close_delta for entry in ticket[1]]))
print('Delta start / end:', sum([entry.total_start_end_delta for entry in ticket[1]]))
print('(found {} occurrences)'.format(len(ticket[1])))
print()
所以最后我们可以迭代所有票证,再次使用列表理解我们可以构建一个只包含增量的列表,这样我们就可以将它们加在一起。您可以看到为什么我们删除旧方法来更新增量,因为现在我们只是将它们的值存储为单个条目,然后将它们外部求和。
这是你的结果:
ticket: 736964
Delta open / close: 17012
Delta start / end: 420
(found 1 occurrences)
ticket: 737197
Delta open / close: 18715
Delta start / end: 840
(found 1 occurrences)
ticket: 737220
Delta open / close: 7980
Delta start / end: 360
(found 1 occurrences)
ticket: 737238
Delta open / close: 34718
Delta start / end: 540
(found 2 occurrences)
ticket: 737261
Delta open / close: 9992
Delta start / end: 600
(found 1 occurrences)
ticket: 737273
Delta open / close: 9223
Delta start / end: 660
(found 1 occurrences)
ticket: 737296
Delta open / close: 6957
Delta start / end: 240
(found 1 occurrences)
ticket: 737318
Delta open / close: 8129
Delta start / end: 1860
(found 1 occurrences)
ticket: 737385
Delta open / close: 10401
Delta start / end: 2340
(found 1 occurrences)
输出:
arima.sim
在故事的最后:列表推导可能非常有用,它们允许您使用超紧凑语法执行大量操作。此外,python标准库包含许多可以真正帮助您的现成工具,所以请熟悉!