删除重复项并将多个列表合并为一个?

时间:2017-10-28 21:50:21

标签: python duplicates

如何删除重复项并将多个列表合并为一个:

function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]])应该返回完全

[["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]

8 个答案:

答案 0 :(得分:1)

最简单的方法是使用defaultdict

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for i,j in l: 
        d[i].append(j)                   #append value to the key
>>> d
=> defaultdict(<class 'list'>, {'hello': ['me.txt'], 'good': ['me.txt', 'money.txt'], 
                                'rep': ['money.txt']})

    #to get it in a list
>>> out = [ [key,d[key]] for key in d]
>>> out
=> [['hello', ['me.txt']], ['good', ['me.txt', 'money.txt']], ['rep', ['money.txt']]]

#driver values:

IN : l = [["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]

答案 1 :(得分:0)

试试这个(不需要库):

your_input_data = [ ["hello","me.txt"], ["good","me.txt"], ["good","me.txt"], ["good","money.txt"], ["rep", "money.txt"] ]


my_dict = {}
for box in your_input_data:

    if box[0] in my_dict:

        buffer_items = []
        for items in box[1:]:
            if items not in my_dict[box[0]]:
                buffer_items.append(items)

        remove_dup = list(set(buffer_items + my_dict[box[0]]))
        my_dict[box[0]] = remove_dup

    else:

        buffer_items = []
        for items in box[1:]:
            buffer_items.append(items)

        remove_dup = list(set(buffer_items))

        my_dict[box[0]] = remove_dup


last_point = [[keys, values] for keys, values in my_dict.items()]

print(last_point)

祝你好运......

答案 2 :(得分:-1)

您也可以使用传统词典。

In [30]: l1 = [["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]

In [31]: for i, j in l1:
    ...:     if i not in d2:
    ...:         d2[i] = j
    ...:     else:
    ...:         val = d2[i]
    ...:         d2[i] = [val, j]
    ...:         

In [32]: d2
Out[32]: {'good': ['me.txt', 'money.txt'], 'hello': 'me.txt', 'rep': 'money.txt'}

In [33]: out = [ [key,d1[key]] for key in d1]

In [34]: out
Out[34]: 
[['rep', ['money.txt']],
['hello', ['me.txt']],
['good', ['me.txt', 'money.txt']]]

答案 3 :(得分:-1)

让我们先了解实际问题:

  

示例提示:

对于这些类型的列表问题,有一种模式:

假设你有一个清单:

a=[(2006,1),(2007,4),(2008,9),(2006,5)]

并且你希望将它转换为dict作为元组的第一个元素作为元组的键和第二个元素。类似的东西:

{2008: [9], 2006: [5], 2007: [4]}

但是有一个问题,你也想要那些具有不同值但键相同的键,如(2006,1)和(2006,5)键是相同的,但值是不同的。你希望那些值只附加一个键,所以预期输出:

{2008: [9], 2006: [1, 5], 2007: [4]}

对于这类问题,我们会这样做:

首先创建一个新的字典然后我们遵循这种模式:

if item[0] not in new_dict:
    new_dict[item[0]]=[item[1]]
else:
    new_dict[item[0]].append(item[1])

因此我们首先检查密钥是否在新的dict中,如果已经存在,则将duplicate key的值添加到其值中:

完整代码:

a=[(2006,1),(2007,4),(2008,9),(2006,5)]

new_dict={}

for item in a:
    if item[0] not in new_dict:
        new_dict[item[0]]=[item[1]]
    else:
        new_dict[item[0]].append(item[1])

print(new_dict)
  

您的实际问题解决方案:

list_1=[["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]

no_dublicates={}

for item in list_1:
    if item[0] not in no_dublicates:
        no_dublicates[item[0]]=["".join(item[1:])]
    else:
        no_dublicates[item[0]].extend(item[1:])

list_result=[]
for key,value in no_dublicates.items():
    list_result.append([key,value])
print(list_result)

输出:

[['hello', ['me.txt']], ['rep', ['money.txt']], ['good', ['me.txt', 'money.txt']]]

答案 4 :(得分:-1)

yourList=[["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
expectedList=[["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]

def getall(allsec, listKey, uniqlist):
    if listKey not in uniqlist:
        uniqlist.append(listKey)
        return [listKey, [x[1] for x in allsec if x[0] == listKey]]

uniqlist=[]
result=sorted(list(filter(lambda x:x!=None, [getall(yourList,elem[0],uniqlist) for elem in yourList])))
print(result)

希望这会有所帮助

答案 5 :(得分:-1)

使用Python创建一个可以提供精确所需输出的函数,可以按如下方式完成:

from collections import defaultdict

def function(data):    
    entries = defaultdict(list)

    for k, v in data:
        entries[k].append(v)

    return sorted([k, v] for k, v in entries.items())

print function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]])  

这将显示函数的返回值:

[['good', ['me.txt', 'money.txt']], ['hello', ['me.txt']], ['rep', ['money.txt']]]  

它还确保键被排序。字典用于处理重复项的删除(因为键必须是唯一的)。

defaultdict()用于简化字典中列表的构建。另一种方法是尝试将新值附加到现有密钥,如果存在KeyError异常,则添加新密钥,如下所示:

def function(data):    
    entries = {}

    for k, v in data:
        try:
            entries[k].append(v)
        except KeyError as e:
            entries[k] = [v]

    return sorted([k, v] for k, v in entries.items())

答案 6 :(得分:-1)

使用dict和sets可以很容易地解决这个问题。

def combine_duplicates(given_list):
    data = {}
    for element_1, element_2 in given_list:
        data[element_1] = data.get(element_1, set()).add(element_2)
    return [[k, list(v)] for k, v in data.items()]

答案 7 :(得分:-4)

创建一个空数组从childs数组推送索引0并连接以将所有值转换为按空格分隔的字符串。

var your_input_data = [ ["hello","hi", "jel"], ["good"], ["good2","lo"], ["good3","lt","ahhahah"], ["rep", "nice","gr8", "job"] ];

var myprint = []
for(var i in your_input_data){
   myprint.push(your_input_data[i][0]);
}
console.log(myprint.join(' '))