删除集合列表的重复项

时间:2015-08-30 13:15:05

标签: python list duplicates set unique

我有一个集合列表:

L = [set([1, 4]), set([1, 4]), set([1, 2]), set([1, 2]), set([2, 4]), set([2, 4]), set([5, 6]), set([5, 6]), set([3, 6]), set([3, 6]), set([3, 5]), set([3, 5])]

(实际上在我的情况下是一个倒数元组列表的转换)

我希望删除重复内容以获取:

L = [set([1, 4]), set([1, 2]), set([2, 4]), set([5, 6]), set([3, 6]), set([3, 5])]

但如果我尝试:

>>> list(set(L))
TypeError: unhashable type: 'set'

或者

>>> list(np.unique(L))
TypeError: cannot compare sets using cmp()

如何获得具有不同集合的集合列表?

4 个答案:

答案 0 :(得分:18)

最好的方法是将您的设置转换为frozenset s(可以播放),然后使用set来获取唯一的设置,例如

>>> list(set(frozenset(item) for item in L))
[frozenset({2, 4}),
 frozenset({3, 6}),
 frozenset({1, 2}),
 frozenset({5, 6}),
 frozenset({1, 4}),
 frozenset({3, 5})]

如果你想将它们作为集合,那么你可以将它们转换回set这样的

>>> [set(item) for item in set(frozenset(item) for item in L)]
[{2, 4}, {3, 6}, {1, 2}, {5, 6}, {1, 4}, {3, 5}]

如果您还希望维护订单,同时删除重复项,那么您可以使用collections.OrderedDict,就像这样

>>> from collections import OrderedDict
>>> [set(i) for i in OrderedDict.fromkeys(frozenset(item) for item in L)]
[{1, 4}, {1, 2}, {2, 4}, {5, 6}, {3, 6}, {3, 5}]

答案 1 :(得分:3)

使用循环的替代方法:

result = list()
for item in L:
    if item not in result:
        result.append(item)

答案 2 :(得分:1)

这是另一种选择

library(stringi)
library(dplyr)
library(magrittr)

data = structure(list(mystring = c("AASDAASADDLKJLKADDLKKLLKJLJADDLJLKJLADLKLADD", 
                                   "ASDSDFJSKADDKJSJKDFKSADDLKJFLAK"), class = c("cat", "dog")), .Names = c("mystring", 
                                                                                                            "class"), row.names = c(NA, -2L), class = "data.frame")

my_function = function(row)
  row$mystring %>% 
  stri_sub(to = 20) %>%
  stri_locate_all_fixed(pattern = "ADD") %>%
  extract2(1) %>%
  as_data_frame

test = 
  data %>%
  group_by(mystring) %>%
  do(my_function(.)) %>%
  left_join(data)

答案 3 :(得分:0)

还有另一种选择。

import itertools
list_sets = [set(['a', 'e', 'f']), set(['c', 'b', 'f']), set(['a', 'e', 'f']), set(['a', 'd']), set(['a', 'e', 'f'])]

lists = [list(s) for s in list_sets] # convert a list of sets to a list of lists
lists.sort()
lists_remove_duplicates = [lists for lists,_ in itertools.groupby(lists)]
print(lists_remove_duplicates)

# output
[['a', 'd'], ['a', 'e', 'f'], ['c', 'b', 'f']]