Question

我正在使用python进行编程，并且列出了

列表

a=[[1234,32.5,'John',1114],[1234,16.3,'John',1115],[1235,25.3,'John',1116],
  [1239,16.3,'Lisa',1117]]

如何合并子索引[0]中类似元素的列表，并删除包含索引[3]的最小元素的列表？

预期产出：

a=[[1234,48.8,'John',1115],[1235,25.3,'John',1116],[1239,16.3,'Lisa',1117]]

自

 a[1][3] > a[0][3] (1116 > 1115)

a[0][1]将添加到a[1][1]，a[0]将被完全删除。

我打算将此用于一万个列表中。

编辑：

我做过：

old=[[1234,32.5,'John',1114],[1234,16.3,'John',1115],[1235,25.3,'John',1116],[1239,16.3,'Lisa',1117]]

memory=old[0]

new=[]

for x, t in enumerate(old):
    if t==memory:
        new.append([t[0],memory[1]+t[1],t[2],t[3]])
        memory=t

但是如果在index [0]中有两个以上的列表相似，那么这对列表不起作用，代码应该反复运行，具体取决于有多少相似的元素。在应用程序中，我的列表列表将在特定索引中包含数百个类似的元素。

Answer 1

您真正需要的是通过公共密钥分组您的数据。 itertools.groupby就是这样做的，您可以使用operator.itemgetter按每个子列表的关键元素进行分组。

一旦遍历群组，跟踪相应的总和和最大值并不难。这假设您打算保持最大而不是删除最小值，因为在一个组中有两个以上的元素会使得在求和第二个元素时无法做到这一点。

from itertools import groupby
from operator import itemgetter

def merge(data):
    out_data = []
    for _, group in groupby(data, key=itemgetter(0, 2)):
        key_num, to_sum, key_name, to_max = next(group)
        for _, sum_val, _, max_val in group:
            to_sum += sum_val
            to_max = max(to_max, max_val)
        out_data.append([key_num, to_sum, key_name, to_max])
    return out_data

<强>演示

>>> a = [[1234,32.5,'John',1114],
         [1234,16.3,'John',1115], 
         [1235,25.3,'John',1116], 
         [1239,16.3,'Lisa',1117]]

>>> merge(a)
[[1234, 48.8, 'John', 1115],
 [1235, 25.3, 'John', 1116],
 [1239, 16.3, 'Lisa', 1117]]

值得注意的是，如果您有许多操作要应用于此类表格数据，您可能需要查看Pandas library。使用Pandas，您的问题的简洁解决方案可能是

import pandas as pd

def pd_merge(data):
    df = pd.DataFrame(data)    
    return (df.groupby((0, 2), as_index=False)
              .agg({1: 'sum', 3: 'max'})
              .sort_index(1))

Answer 2

以下是我的解决方案，似乎可以处理超过2的元素：

from collections import defaultdict

a=[[1234,32.5,'John',1114], [1234,32.5,'John',1113],[1234,16.3,'John',1115],[1235,25.3,'John',1116],  [1239,16.3,'Lisa',1117]]

def merge_list(data):
    total_dic = defaultdict(list)
    new_data = []
    for elem in a:
        total_dic[elem[0]].append(elem)

    for dic_elem in total_dic:
        total_dic[dic_elem].sort(key=lambda x: x[3], reverse=False)
        if(len(total_dic[dic_elem]) > 1):
            new_data.append(total_dic[dic_elem][1:])
        else:
            new_data.append(total_dic[dic_elem][0])
    return new_data

print(merge_list(a))

[[[1234, 32.5, 'John', 1114], [1234, 16.3, 'John', 1115]], [1235, 25.3, 'John', 1116], [1239, 16.3, 'Lisa', 1117]]

Answer 3

谢谢大家，我已经设法通过使用itertool的groupby来回答我的问题

这是我的工作原型：

(b,c)

输出

from itertools import groupby
from operator import itemgetter


def merge(data):
    out_data = []
    for key, group in groupby(data, key=itemgetter('name','time')):
        id_temp = 0
        dep_temp=0

        dict_temp={}
        for t in group:
            dict_temp=t
            if t["deposit_id"] < id_temp:
                dict_temp['deposit_id']=id_temp
            else:
                id_temp=dict_temp['deposit_id']
            dep_temp+=dict_temp['deposit']
        dict_temp['deposit'], dict_temp['deposit_id'] = dep_temp, id_temp
        out_data.append(dict_temp)
    return out_data

a = [{'name':'John','time':1234,'deposit':16.7,'deposit_id':1115},
 {'name':'John','time':1234,'deposit':24.3,'deposit_id':1116},
 {'name':'John','time':1234,'deposit':65.3,'deposit_id':1117},
 {'name':'John','time':1235,'deposit':95.3,'deposit_id':1118},
 {'name':'Lisa','time':1235,'deposit':95.3,'deposit_id':1119}]

b=merge(a)

for t in b:
    print t

如果列表在另一个特定索引中相似，则添加相同的列表索引

3 个答案: