我有一个像这样的列表列表
[[], [1, 2, 2], [1], [2], [2], [1, 2], [1, 2], [2, 1], [2, 2]]
我想删除订单无关紧要的所有重复项,因此在上面的列表中,我需要删除[2]
,[1,2]
和[2,1]
。
我以为我可以用Counter()
from collections import Counter
counter_list = []
no_dublicates = []
for sub_list in all_subsets:
counter_dic = Counter(sub_list)
if counter_dic in counter_list:
pass
else:
no_dublicates.append(list(sub_list))
counter_list.append(counter_dic)
工作正常...但它是我的代码中最慢的部分。我想知道是否有更快的方法来做到这一点?
答案 0 :(得分:5)
您可以将Counter
个对象转换为frozenset
s,这些对象可以播放,并且可以放在一个集合中,以便在in
支票上进行线性节省:
from collections import Counter
counters = set()
no_duplicates = []
for sub_list in all_subsets:
c = frozenset(Counter(sub_list).items())
if c not in counters:
counters.add(c)
no_duplicates.append(list(sub_list))
用dict理解来做这件事看起来也很酷:
no_duplicates = list(
{frozenset(Counter(l).items()): l for l in all_subsets}.values())
答案 1 :(得分:0)
如果您不想使用collections
模块,您还可以尝试这样的简单解决方案:
lsts = [[], [1, 2, 2], [1], [2], [2], [1, 2], [1, 2], [2, 1], [2, 2]]
counts = {}
for sublist in lsts:
key = tuple(sorted(sublist))
counts[key] = counts.get(key, 0) + 1
result = []
for sublist in lsts:
key = tuple(sorted(sublist))
if counts[key] == 1:
result.append(sublist)
print(result)
哪个输出:
[[], [1, 2, 2], [1], [2, 2]]
答案 2 :(得分:-2)
为什么你要使用任何外部模块,为什么只用几行代码就可以使它变得太复杂:
data_=[[], [1, 2, 2], [1], [2], [2], [1, 2], [1, 2], [2, 1], [2, 2]]
dta_dict={}
for j,i in enumerate(data_):
if tuple(sorted(i)) not in dta_dict:
dta_dict[tuple(sorted(i))]=[j]
else:
dta_dict[tuple(sorted(i))].append(j)
print(dta_dict.keys())
输出:
dict_keys([(1, 2), (), (1,), (2, 2), (1, 2, 2), (2,)])
如果你想要列表而不是tutple:
print(list(map(lambda x:list(x),dta_dict.keys())))
输出:
[[1, 2], [], [1], [2, 2], [1, 2, 2], [2]]