使用Python,我想执行以下操作……构建一组元组。但是,我只想将一个集合添加到元组中(如果该元组中不存在该集合)。每套都是一对。我使用集合是因为对的顺序无关紧要。我正在使用元组,因为我要处理1.5行以上的数据,并且元组的搜索比列表快。我相信我仍然需要做一些列表理解,但这是我的问题之一。我的第一个问题是我的代码已损坏,我该如何解决?我的第二个问题是,如何提高代码的效率?
我已简化了本示例,仅提供了基础知识。每个新集合都会从数据源接收并通过处理。
my_tuple = ({"a", "b"}, {"c", "d"}, {"c", "e"}) # Existing tuple
new_set = {"b", "c"} # Get a set from data source
set_exists = any(new_set in a_set for a_set in my_tuple)
if not set_exists:
my_tuple += (new_set,)
print(my_tuple)
({'a', 'b'}, {'c', 'd'}, {'c', 'e'}, {'b', 'c'})
那很好。该集合不在元组中。
new_set = {"b", "a"} # Get a set from data source
set_exists = any(new_set in a_set for a_set in my_tuple)
if not set_exists:
my_tuple += (new_set,)
print(my_tuple)
({'a', 'b'}, {'c', 'd'}, {'c', 'e'}, {'b', 'c'}, {'a', 'b'})
不好。该集合已经存在于元组中。它不应该被添加。
非常感谢您的帮助。
答案 0 :(得分:3)
您应该检查的条件比您想象的要容易得多
set_exists = new_set in my_tuple
您的代码应与此一起使用。
无论如何,附加到tuple
上的是 slow ;如果您正在寻找性能,那么您的方法肯定不是最好的。一种改进是使用list
,它具有非常快的附加操作,但是像tuple
一样,成员资格测试也很慢。实际上,与您的想法相反,list
和tuple
在搜索时实际上同样慢。
解决方案是使用set
中的frozensets
:
my_tuple = ({"a", "b"}, {"c", "d"}, {"c", "e"})
# convert to set, it's way faster!
# (this is a one-time operation, if possible, have your data in this format beforehand)
my_set = set(frozenset(s) for s in my_tuple)
# Again, if possible, get your data in the form of a frozenset so conversion is not needed
new_set = frozenset(("b", "c"))
if new_set not in my_set: # very fast!
my_set.add(new_set)
new_set = frozenset(("a", "b"))
my_set.add(new_set) # the check is actually unneeded for sets
print(my_set)
速度演示:
l = list(range(10 ** 6))
t = tuple(range(10 ** 6))
s = set(range(10 ** 6))
# Appending to tuple is slow!
%timeit global t; t += (1,)
11.4 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Appending to list is fast!
%timeit l.append(1)
107 ns ± 6.43 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# List and tuple membership tests are slow!
%timeit 500000 in l
5.9 ms ± 83.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit 500000 in t
6.62 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# These operations are trivial for sets...
%timeit 500000 in s
73 ns ± 6.91 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
答案 1 :(得分:1)
您应该只使用一组集合,frozenset确切地说是因为集合不是可哈希的类型:
my_set = {frozenset(["a", "b"]), frozenset(["c", "d"]), frozenset(["c", "e"])}
my_set.add(frozenset(["b", "a"]))
print(my_set)
# >>> set([frozenset(['c', 'e']), frozenset(['a', 'b']), frozenset(['c', 'd'])])
my_set.add(frozenset(["b", "z"]))
print(my_set)
# >>> set([frozenset(['c', 'e']), frozenset(['a', 'b']), frozenset(['b', 'z']), frozenset(['c', 'd'])])