过滤掉删除重复项的字典

时间:2015-01-27 05:26:47

标签: python list

python的新手,我有以下字典,并得到一个字典表格,这将不包含重复但如果找到重复,那么数据必须附加到第一个键和值。 例如,在41中有重复并且有pending = 1并且已交付14.我想从此创建这样的列表,它只包含一行41和contians计数挂起和交付以及这两种状态的加号。

    temp = [
(41, 1, 2015-1-22 12:37:58.631670, 'Pending'), 
(37, 1, 2015-1-21 13:56:3.632057, 'Delivered'), 
(41, 14, 2015-1-22 12:37:58.631670, 'Delivered'), 
(36, 1, 2015-1-21 13:22:52.705818, 'Delivered'), 
(40, 2, 2015-1-22 12:37:58.631670, 'Delivered'), 
(38, 1, 2015-1-21 14:4:10.206100,, 'Delivered')
]

第一列是id,第二列是状态计数,状态是待处理,已交付,失败。

如果想要像这样制作字典

dict = {id : { id : id, Pending : pending_count, Failed : failed_count, Delivered : delivered_count, total : pending+failed+delivered, date-time : date-time}}

喜欢

dict = { 
id : { 'id' : 41, 'Pending' : 1, 'Failed' : 0, 'Delivered' : 14, 'total' : 15, 'date time' : 2015-1-22 12:37:58.631670},
id : { 'id' : 37, 'Pending' : 0, 'Failed' : 0, 'Delivered' : 1, 'total' : 1, 'date-time' : 2015-1-21 13:56:3.632057}
}

2 个答案:

答案 0 :(得分:2)

由于输入列表具有恒定的结构。列表元素为tuple 因此,对于输出字典,每个id内的第一个项tuplekey,输出字典的值也是字典。

  1. 迭代temp列表中的每个项目。
  2. 根据状态获取所有计数并分配尊重计数。使用if语句。
  3. 如果输出字典中存在键,则更新现有值,即更新所有计数值以及总计和日期值。
  4. 如果不存在,则添加输出字典。
  5. 使用try除了处理异常,即当输出字典中没有键时。
  6. 代码:

    import pprint
    
    temp = [
    (41, 1, "2015-1-22 12:37:58.631670", 'Pending'), 
    (37, 1, "2015-1-21 13:56:3.632057", 'Delivered'), 
    (41, 14, "2015-1-22 12:37:58.631670", 'Delivered'), 
    (36, 1, "2015-1-21 13:22:52.705818", 'Delivered'), 
    (40, 2, "2015-1-22 12:37:58.631670", 'Delivered'), 
    (38, 1, "2015-1-21 14:4:10.206100", 'Delivered')
    ]
    
    output = {}
    for i in temp:
        id = i[0]
        count = i[1]
        date_v = i[2]
        status = i[3]
        p_count = 0
        d_count = 0
        f_count = 0
        if status=="Pending":
            p_count = count
        elif status=="Delivered":
            d_count = count
        elif status=="Failed":
            f_count= count
    
        try:
            output[i[0]]["Pending"] = output[i[0]]["Pending"]+p_count
            output[i[0]]["Failed"] = output[i[0]]["Failed"]+f_count
            output[i[0]]["Delivered"] = output[i[0]]["Delivered"]+d_count
            output[i[0]]["total"] = output[i[0]]["Pending"]+count
            output[i[0]]["date time"] = date_v
        except KeyError, e:
            total = count
            output[i[0]] = {'id':id, 'Pending':p_count, 'Failed':f_count,\
                            'Delivered':d_count, 'total':total, 'date time':date_v}
    
    
    pprint.pprint(output)
    

    输出:

    {36: {'Delivered': 1,
          'Failed': 0,
          'Pending': 0,
          'date time': '2015-1-21 13:22:52.705818',
          'id': 36,
          'total': 1},
     37: {'Delivered': 1,
          'Failed': 0,
          'Pending': 0,
          'date time': '2015-1-21 13:56:3.632057',
          'id': 37,
          'total': 1},
     38: {'Delivered': 1,
          'Failed': 0,
          'Pending': 0,
          'date time': '2015-1-21 14:4:10.206100',
          'id': 38,
          'total': 1},
     40: {'Delivered': 2,
          'Failed': 0,
          'Pending': 0,
          'date time': '2015-1-22 12:37:58.631670',
          'id': 40,
          'total': 2},
     41: {'Delivered': 14,
          'Failed': 0,
          'Pending': 1,
          'date time': '2015-1-22 12:37:58.631670',
          'id': 41,
          'total': 15}}
    

答案 1 :(得分:1)

我认为你想要的是:

from collections import defaultdict

d = defaultdict(dict)

for row in temp:
  result = d[row[0]]
  result[row[-1]] = result.setdefault(row[-1], 0) + row[1]
  result['total'] = result.setdefault('total', 0) + row[1]
  result['{}-date'.format(row[-1])] = row[2]

对于d[41],这会给你:

{'Delivered': 14,
 'total': 15,
 'Pending-date': '2015-1-22 12:37:58.631670',
 'Pending': 1,
 'Delivered-date': '2015-1-22 12:37:58.631670'}