我正在尝试从嵌套列表中删除重复的子列表,如下所示:
result_set = [
['MEMS', 'MEMS', 'MEMS', 'MEMS'],
['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'],
['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'],
['Photolithography', 'Photolithography', 'Lithography', 'Photolithography'],
['MEMS', 'MEMS', 'MEMS', 'MEMS']
]
我想要的输出如下:
result_set = [
['MEMS', 'MEMS', 'MEMS', 'MEMS'],
['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'],
['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'],
['Photolithography', 'Photolithography', 'Lithography', 'Photolithography']
]
请注意,基本上最后一个元素[' MEMS' MEMS' MEMS' MEMS' MEMS']不再存在。已经询问了Similar questions,我从那里调整了以下代码:
result_set = set(frozenset(x) for x in result)
lst = [list(x) for x in result_set]
我的问题是我得到以下输出:
result_set = [['MEMS'], ['Microfluidics'], ['Microfabrication', 'Clean-Room Microfabrication'], ['Photolithography', 'Lithography']]
请注意,它还会删除子列表中的重复元素。我不想要这个,因为我之后的目标是绘制直方图。比如说 - > MEMS有4次发生。因此,我想跟踪每个子列表最初的元素数量。
答案 0 :(得分:3)
如果订单无关紧要,您可以使用set
:
final_data = list(map(list, set(map(tuple, result_set))))
输出:
[['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography'], ['MEMS', 'MEMS', 'MEMS', 'MEMS']]
如果订单确实重要,您可以尝试:
final_data = []
for result in result_set:
if result not in final_data:
final_data.append(result)
输出:
[['MEMS', 'MEMS', 'MEMS', 'MEMS'], ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], ['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography']]
答案 1 :(得分:0)
对列表进行排序,然后使用itertools.groupby()生成的密钥创建新列表。
import itertools
result_set.sort()
new_set = [k for k,g in itertools.groupby(result_set)]