删除嵌套列表中的重复项(不删除子列表中的重复元素)

时间:2017-09-09 23:16:17

标签: python list nested duplicates nested-lists

我正在尝试从嵌套列表中删除重复的子列表,如下所示:

result_set = [
    ['MEMS', 'MEMS', 'MEMS', 'MEMS'],
    ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'],
    ['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'],
    ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography'],
    ['MEMS', 'MEMS', 'MEMS', 'MEMS']
    ]

我想要的输出如下:

result_set = [
    ['MEMS', 'MEMS', 'MEMS', 'MEMS'],
    ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'],
    ['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'],
    ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography']
    ]

请注意,基本上最后一个元素[' MEMS' MEMS' MEMS' MEMS' MEMS']不再存在。已经询问了Similar questions,我从那里调整了以下代码:

result_set = set(frozenset(x) for x in result)
lst = [list(x) for x in result_set]

我的问题是我得到以下输出:

 result_set = [['MEMS'], ['Microfluidics'], ['Microfabrication', 'Clean-Room Microfabrication'], ['Photolithography', 'Lithography']]

请注意,它还会删除子列表中的重复元素。我不想要这个,因为我之后的目标是绘制直方图。比如说 - > MEMS有4次发生。因此,我想跟踪每个子列表最初的元素数量。

2 个答案:

答案 0 :(得分:3)

如果订单无关紧要,您可以使用set

final_data = list(map(list, set(map(tuple, result_set))))

输出:

[['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography'], ['MEMS', 'MEMS', 'MEMS', 'MEMS']]

如果订单确实重要,您可以尝试:

final_data = []
for result in result_set:
    if result not in final_data:
         final_data.append(result)

输出:

[['MEMS', 'MEMS', 'MEMS', 'MEMS'], ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], ['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography']]

答案 1 :(得分:0)

对列表进行排序,然后使用itertools.groupby()生成的密钥创建新列表。

import itertools
result_set.sort()
new_set = [k for k,g in itertools.groupby(result_set)]