有没有一种好方法可以在python中获取两个排序列表的对称差异,并返回一个排序列表作为结果。我当前的版本似乎是一个糟糕的工作(转换为设置,找到对称差异,转换回列表,然后度假)
使用Numpy的解决方案很好,正在排序的数据类型是整数。
Array.prototype.forEach
答案 0 :(得分:1)
是的,有办法。您必须利用两个序列的排序事实。您需要在逐个比较元素的同时遍历两者,并在沿每个序列前进时构建对称差异。
如果您熟悉大O符号,则以下代码的复杂性为O(m+n)
,其中m = len(seq1)
和n = len(seq2)
算法的复杂性为O(log(m+n)*(m+n))
,因为您需要对结果集进行排序。
警告:
这个答案主要是演示如何利用排序输入的练习。
尽管复杂性更高,但对于大多数输入,它的执行时间 比原始海报的代码慢 使用python内置
set
方法。在python中,集合用c代码实现 引擎盖下。纯蟒蛇将很难击败它。非常 看到任何优势(如果有的话)都需要大量投入 可见)。这种算法是最有效的,但这并不意味着 它更快 - 也不意味着你应该使用它:set 内置方法经过优化和战斗测试c代码;他们为 代码,更易于编写,读取,理解,调试和维护。
def get_symmetric_difference(seq1, seq2):
"""
computes the symmetric difference of unique elements of seq1 & seq2
as a new sorted list, without mutating the parameters.
seq1: a sorted sequence of int
seq2: a sorted sequence of int
return: a new sorted list containing the symmetric difference
of unique elements of seq1 & seq2
"""
if not seq1:
symmetric_difference = seq2[:]
return symmetric_difference
if not seq2:
symmetric_difference = seq1[:]
return symmetric_difference
symmetric_difference = []
idx = 0
jdx = 0
last_insert = None
last_seen = None
while idx < len(seq1) and jdx < len(seq2):
s1 = seq1[idx]
s2 = seq2[jdx]
if s1 == s2:
idx += 1
jdx += 1
last_seen = s1
elif s1 < s2:
if last_insert != s1 and last_seen != s1:
symmetric_difference.append(s1)
last_insert = s1
idx += 1
elif s2 < s1:
if last_insert != s2 and last_seen != s2:
symmetric_difference.append(s2)
last_insert = s2
jdx += 1
if len(seq1[idx:]) > len(seq2[jdx:]):
for elt in seq1[idx:]:
if last_insert != elt and last_seen != elt:
symmetric_difference.append(elt)
last_insert = elt
last_seen = elt
else:
for elt in seq2[jdx:]:
if last_insert != elt and last_seen != elt:
symmetric_difference.append(elt)
last_insert = elt
last_seen = elt
return symmetric_difference
def test_get_symmetric_difference():
seq1 = []
seq2 = []
assert get_symmetric_difference(seq1, seq2) == []
seq1 = [1]
seq2 = []
assert get_symmetric_difference(seq1, seq2) == [1]
seq1 = [1, 2, 3, 4]
seq2 = [-2, -1, 5, 6, 7, 8]
assert get_symmetric_difference(seq1, seq2) == [-2, -1, 1, 2, 3, 4, 5, 6, 7, 8]
seq1 = [ -1, 1, 2, 3, 4, 6, 9, 22, 34]
seq2 = [-2, -1, 5, 6, 7, 8, 19, 22, 43]
assert get_symmetric_difference(seq1, seq2) == [-2, 1, 2, 3, 4, 5, 7, 8, 9, 19, 34, 43]
seq1 = [-2, -1, 5, 6, 7, 8, 19, 22, 43]
seq2 = [ -1, 1, 2, 3, 4, 6, 9, 22, 34]
assert get_symmetric_difference(seq1, seq2) == [-2, 1, 2, 3, 4, 5, 7, 8, 9, 19, 34, 43]
seq1 = [-2, -1, 0, 5, 22, 34]
seq2 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
assert get_symmetric_difference(seq1, seq2) == [0, 1, 2, 3, 4, 5, 6, 9]
seq1 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
seq2 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
assert get_symmetric_difference(seq1, seq2) == []
seq1 = [7, 7, 7, 7, 7, 7]
seq2 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
assert get_symmetric_difference(seq1, seq2) == [-2, -1, 1, 2, 3, 4, 6, 7, 9, 22, 34]
seq1 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
seq2 = [7, 7, 7, 7, 7, 7]
assert get_symmetric_difference(seq1, seq2) == [-2, -1, 1, 2, 3, 4, 6, 7, 9, 22, 34]
seq1 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
seq2 = [-1, -1, 7, 7, 43, 43, 43]
assert get_symmetric_difference(seq1, seq2) == [-2, 1, 2, 3, 4, 6, 7, 9, 22, 34, 43]
seq1 = [34, 34, 34, 34]
seq2 = [7, 34]
assert get_symmetric_difference(seq1, seq2) == [7]
seq1 = [7, 34]
seq2 = [34, 34, 34, 34]
assert get_symmetric_difference(seq1, seq2) == [7]
seq1 = [7, 34]
seq2 = [7, 7, 7, 7, 7]
assert get_symmetric_difference(seq1, seq2) == [34]
seq1 = [7, 7, 7, 7, 34]
seq2 = [7, 7]
assert get_symmetric_difference(seq1, seq2) == [34]
print("***all tests pass***")
test_get_symmetric_difference()
***all tests pass***
答案 1 :(得分:0)
永远不要相信set
要排序。在您希望返回已排序的set
时,请始终在将list
转换为list
个对象后进行排序。我不确定我在下面的解释中观察到的行为。
转换回列表后不需要排序,因为列表已经排序了。删除额外的排序会使它更有效率。
如果保证list1
和list2
是正int
个对象的排序列表,则生成的symmetric_difference
set
似乎返回在Python 3.5中排序。如果list1
和list2
包含任何否定int
或float
,则需要再次对结果进行排序。
def sorted_symdiff(list1,list2):
""" Each list is already sorted, this seems inefficient """
s1,s2 = set(list1),set(list2)
diff = list(s1.symmetric_difference(s2))
return diff