Python3,嵌套字典比较(递归?)

时间:2016-10-07 17:47:34

标签: python list csv dictionary recursion

我正在编写一个程序来获取.csv文件并为票证关闭数据创建“指标”。每张票都有一个或多个时间条目;目标是获取open - >的'delta'(即 - 时差)。 closetime_start - >每票价为time_end;这些不是真正的变量,它们只是出于这个问题的目的。

所以,假设我们的票12345有3个时间条目,如下:

ticket: 12345 open: 2016-09-26 00:00:00.000 close: 2016-09-27 00:01:00.000 time_start: 2016-09-26 00:01:00.000 time_end: 2016-09-26 00:02:00.000 ticket: 12345 open: 2016-09-26 00:00:00.000 close: 2016-09-27 00:01:00.000 time_start: 2016-09-26 00:01:00.000 time_end: 2016-09-26 00:02:00.000 ticket: 12345 open: 2016-09-26 00:00:00.000 close: 2016-09-27 00:01:00.000 time_start: 2016-09-26 00:01:00.000 time_end: 2016-09-27 00:02:00.000

我希望程序显示一个条目,加上'增量',如下:

ticket: 12345 Delta open/close ($total time from open to close): Delta start/end: ($total time of ALL ticket time entries added up)

这是我到目前为止所拥有的;

.csv示例:

Ticket #,Ticket Type,Opened,Closed,Time Entry Day,Start,End
737385,Software,2016-09-06 12:48:31.680,2016-09-06 15:41:52.933,2016-09-06 00:00:00.000,1900-01-01 15:02:00.417,1900-01-01 15:41:00.417
737318,Hardware,2016-09-06 12:20:28.403,2016-09-06 14:35:58.223,2016-09-06 00:00:00.000,1900-01-01 14:04:00.883,1900-01-01 14:35:00.883
737296,Printing/Scan/Fax,2016-09-06 11:37:10.387,2016-09-06 13:33:07.577,2016-09-06 00:00:00.000,1900-01-01 13:29:00.240,1900-01-01 13:33:00.240
737273,Software,2016-09-06 10:54:40.177,2016-09-06 13:28:24.140,2016-09-06 00:00:00.000,1900-01-01 13:17:00.860,1900-01-01 13:28:00.860
737261,Software,2016-09-06 10:33:09.070,2016-09-06 13:19:41.573,2016-09-06 00:00:00.000,1900-01-01 13:05:00.113,1900-01-01 13:15:00.113
737238,Software,2016-09-06 09:52:57.090,2016-09-06 14:42:16.287,2016-09-06 00:00:00.000,1900-01-01 12:01:00.350,1900-01-01 12:04:00.350
737238,Software,2016-09-06 09:52:57.090,2016-09-06 14:42:16.287,2016-09-06 00:00:00.000,1900-01-01 14:36:00.913,1900-01-01 14:42:00.913
737220,Password,2016-09-06 09:28:16.060,2016-09-06 11:41:16.750,2016-09-06 00:00:00.000,1900-01-01 11:30:00.303,1900-01-01 11:36:00.303
737197,Hardware,2016-09-06 08:50:23.197,2016-09-06 14:02:18.817,2016-09-06 00:00:00.000,1900-01-01 13:48:00.530,1900-01-01 14:02:00.530
736964,Internal,2016-09-06 01:02:27.453,2016-09-06 05:46:00.160,2016-09-06 00:00:00.000,1900-01-01 06:38:00.917,1900-01-01 06:45:00.917

class Time_Entry.py:

#! /usr/bin/python
from datetime import *

class Time_Entry:

def __init__(self, ticket_no, time_entry_day, opened, closed, start, end):
    self.ticket_no = ticket_no
    self.time_entry_day = time_entry_day
    self.opened = opened
    self.closed = closed
    self.start = datetime.strptime(start, '%Y-%m-%d %H:%M:%S.%f')
    self.end = datetime.strptime(end, '%Y-%m-%d %H:%M:%S.%f')
    self.total_open_close_delta = 0
    self.total_start_end_delta = 0

def open_close_delta(self, topen, tclose):
    open_time = datetime.strptime(topen, '%Y-%m-%d %H:%M:%S.%f')
    if tclose != '\\N':
        close_time = datetime.strptime(tclose, '%Y-%m-%d %H:%M:%S.%f')
        self.total_open_close_delta = close_time - open_time

def start_end_delta(self, tstart, tend):
    start_time = datetime.strptime(tstart, '%Y-%m-%d %H:%M:%S.%f')
    end_time = datetime.strptime(tend, '%Y-%m-%d %H:%M:%S.%f')
    start_end_delta = (end_time - start_time).seconds
    self.total_start_end_delta += start_end_delta
    return (self.total_start_end_delta)

def add_start_end_delta(self, delta):
    self.total_start_end_delta += delta

def display(self):
    print('Ticket #: %7.7s Start: %-15s End: %-15s Delta: %-10s' % (self.ticket_no, self.start.time(), self.end.time(), self.total_start_end_delta))

由metrics.py调用:

#! /usr/bin/python

import csv
import pprint
from Time_Entry import *

file = '/home/jmd9qs/userdrive/metrics.csv'

# setup CSV, load up a list of dicts
reader = csv.DictReader(open(file))
dict_list = []

for line in reader:
    dict_list.append(line)

def load_tickets(ticket_list):
    for i, key in enumerate(ticket_list):
        ticket_no = key['Ticket #']
        time_entry_day = key['Time Entry Day']
        opened = key['Opened']
        closed = key['Closed']
        start = key['Start']
        end = key['End']

        time_entry = Time_Entry(ticket_no, time_entry_day, opened, closed, start, end)
        time_entry.open_close_delta(opened, closed)
        time_entry.start_end_delta(start, end)

        for h, key2 in enumerate(ticket_list):
            ticket_no2 = key2['Ticket #']
            time_entry_day2 = key2['Time Entry Day']
            opened2 = key2['Opened']
            closed2 = key2['Closed']
            start2 = key2['Start']
            end2 = key2['End']
            time_entry2 = Time_Entry(ticket_no2, time_entry_day2, opened2, closed2, start2, end2)

            if time_entry.ticket_no == time_entry2.ticket_no and i != h:
                # add delta and remove second time_entry from dict (no counting twice)
                time_entry2_delta = time_entry2.start_end_delta(start2, end2)
                time_entry.add_start_end_delta(time_entry2_delta)
                del dict_list[h]
    time_entry.display()

load_tickets(dict_list)

到目前为止,这似乎工作正常;但是,我为每张票提供了多行输出,而不是添加了“增量”的输出。 FYI程序显示输出的方式与我的例子有所不同,这是有意的。见下面的例子:

Ticket #:  738388 Start: 15:24:00.313000 End: 15:35:00.313000 Delta: 2400      
Ticket #:  738388 Start: 16:30:00.593000 End: 16:40:00.593000 Delta: 1260      
Ticket #:  738381 Start: 15:40:00.763000 End: 16:04:00.767000 Delta: 1440      
Ticket #:  738357 Start: 13:50:00.717000 End: 14:10:00.717000 Delta: 1200      
Ticket #:  738231 Start: 11:16:00.677000 End: 11:21:00.677000 Delta: 720       
Ticket #:  738203 Start: 16:15:00.710000 End: 16:31:00.710000 Delta: 2160      
Ticket #:  738203 Start: 09:57:00.060000 End: 10:02:00.060000 Delta: 1560      
Ticket #:  738203 Start: 12:26:00.597000 End: 12:31:00.597000 Delta: 900       
Ticket #:  738135 Start: 13:25:00.880000 End: 13:50:00.880000 Delta: 2040      
Ticket #:  738124 Start: 07:56:00.117000 End: 08:31:00.117000 Delta: 2100      
Ticket #:  738121 Start: 07:47:00.903000 End: 07:52:00.903000 Delta: 300       
Ticket #:  738115 Start: 07:15:00.443000 End: 07:20:00.443000 Delta: 300       
Ticket #:  737926 Start: 06:40:00.813000 End: 06:47:00.813000 Delta: 420       
Ticket #:  737684 Start: 18:50:00.060000 End: 20:10:00.060000 Delta: 13380     
Ticket #:  737684 Start: 13:00:00.560000 End: 13:08:00.560000 Delta: 8880      
Ticket #:  737684 Start: 08:45:00        End: 10:00:00        Delta: 9480      

请注意,有一些门票有多个条目,这是我不想要的。

关于风格,惯例等的任何注释也欢迎,因为我想要更多'Pythonic'

1 个答案:

答案 0 :(得分:2)

这里的问题是,对于您实现的嵌套循环,您需要仔细检查同一个故障单。让我更好地解释一下:

ticket_list = [111111, 111111, 666666, 777777] # lets simplify considering the ids only

# I'm trying to keep the same variable names
for i, key1 in enumerate(ticket_list): # outer loop

    cnt = 1

    for h, key2 in enumerate(ticket_list): # inner loop
        if key1 == key2 and i != h:
            print('>> match on i:', i, '- h:', h)
            cnt += 1

    print('Found', key1, cnt, 'times')

了解它如何重复计算111111

>> match on i: 0 - h: 1
Found 111111 2 times
>> match on i: 1 - h: 0
Found 111111 2 times
Found 666666 1 times
Found 777777 1 times

这是因为当内循环检查第一个位置和第二个位置(111111)时,你将匹配i: 0, h: 1,当外部位于第二个位置时再次匹配i: 1, h: 0而内在于第一个(import csv import itertools from datetime import * class Time_Entry(object): def __init__(self, entry): self.ticket_no = entry['Ticket #'] self.time_entry_day = entry['Time Entry Day'] self.opened = datetime.strptime(entry['Opened'], '%Y-%m-%d %H:%M:%S.%f') self.closed = datetime.strptime(entry['Closed'], '%Y-%m-%d %H:%M:%S.%f') self.start = datetime.strptime(entry['Start'], '%Y-%m-%d %H:%M:%S.%f') self.end = datetime.strptime(entry['End'], '%Y-%m-%d %H:%M:%S.%f') self.total_open_close_delta = (self.closed - self.opened).seconds self.total_start_end_delta = (self.end - self.start).seconds def display(self): print('Ticket #: %7.7s Start: %-15s End: %-15s Delta: %-10s' % (self.ticket_no, self.start.time(), self.end.time(), self.total_start_end_delta)) )。

建议的解决方案

更好的解决方案是将同一张票的条目分组,然后汇总您的增量。 groupby非常适合您的任务。在这里,我冒昧地重写了一些代码:

这里我修改了构造函数以接受字典本身。它使得传递参数后来不那么混乱。我还删除了添加增量的方法,稍后我们会看到原因。

Time_Entry

我们在这里使用list comprehensions加载数据,最终输出将是with open('metrics.csv') as ticket_list: time_entry_list = [Time_Entry(line) for line in csv.DictReader(ticket_list)] print(time_entry_list) # [<Time_Entry object at 0x101142f60>, <Time_Entry object at 0x10114d048>, <Time_Entry object at 0x1011fddd8>, ... ] 个对象的列表:

Time_Entry

在嵌套循环版本中,您继续在内部循环中重建Time_Entry,这意味着对于100个条目,您最终会初始化10000个临时变量!创建列表&#34;外部&#34;相反,我们只允许我们初始化每个groupby一次。

这就是魔术:我们可以使用ticket_no来收集同一列表中具有相同sorted(time_entry_list, key=lambda x: x.ticket_no) ticket_grps = itertools.groupby(time_entry_list, key=lambda x: x.ticket_no) tickets = [(id, [t for t in tickets]) for id, tickets in ticket_grps] 的所有对象:

ticket

Time_Entry中的最终结果是一个列表元组,其中第一个位置的票证ID,以及最后一个中关联的print(tickets) # [('737385', [<Time_Entry object at 0x101142f60>]), # ('737318', [<Time_Entry object at 0x10114d048>]), # ('737238', [<Time_Entry object at 0x1011fdd68>, <Time_Entry object at 0x1011fde80>]), # ...] 列表:

for ticket in tickets:
    print('ticket:', ticket[0])
    # extract list of deltas and then sum
    print('Delta open / close:', sum([entry.total_open_close_delta for entry in ticket[1]]))
    print('Delta start / end:', sum([entry.total_start_end_delta for entry in ticket[1]]))
    print('(found {} occurrences)'.format(len(ticket[1])))
    print()

所以最后我们可以迭代所有票证,再次使用列表理解我们可以构建一个只包含增量的列表,这样我们就可以将它们加在一起。您可以看到为什么我们删除旧方法来更新增量,因为现在我们只是将它们的值存储为单个条目,然后将它们外部求和。

这是你的结果:

ticket: 736964
Delta open / close: 17012
Delta start / end: 420
(found 1 occurrences)

ticket: 737197
Delta open / close: 18715
Delta start / end: 840
(found 1 occurrences)

ticket: 737220
Delta open / close: 7980
Delta start / end: 360
(found 1 occurrences)

ticket: 737238
Delta open / close: 34718
Delta start / end: 540
(found 2 occurrences)

ticket: 737261
Delta open / close: 9992
Delta start / end: 600
(found 1 occurrences)

ticket: 737273
Delta open / close: 9223
Delta start / end: 660
(found 1 occurrences)

ticket: 737296
Delta open / close: 6957
Delta start / end: 240
(found 1 occurrences)

ticket: 737318
Delta open / close: 8129
Delta start / end: 1860
(found 1 occurrences)

ticket: 737385
Delta open / close: 10401
Delta start / end: 2340
(found 1 occurrences)

输出:

arima.sim

在故事的最后:列表推导可能非常有用,它们允许您使用超紧凑语法执行大量操作。此外,python标准库包含许多可以真正帮助您的现成工具,所以请熟悉!