Python:对称差异排序列表

时间:2017-10-06 16:18:36

标签: python list sorting numpy set

有没有一种好方法可以在python中获取两个排序列表的对称差异,并返回一个排序列表作为结果。我当前的版本似乎是一个糟糕的工作(转换为设置,找到对称差异,转换回列表,然后度假)

使用Numpy的解决方案很好,正在排序的数据类型是整数。

Array.prototype.forEach

2 个答案:

答案 0 :(得分:1)

是的,有办法。您必须利用两个序列的排序事实。您需要在逐个比较元素的同时遍历两者,并在沿每个序列前进时构建对称差异。

如果您熟悉大O符号,则以下代码的复杂性为O(m+n),其中m = len(seq1)n = len(seq2)

算法的复杂性为O(log(m+n)*(m+n)),因为您需要对结果集进行排序。

  

警告:

     

这个答案主要是演示如何利用排序输入的练习。

     

尽管复杂性更高,但对于大多数输入,它的执行时间   比原始海报的代码慢   使用python内置set方法。在python中,集合用c代码实现   引擎盖下。纯蟒蛇将很难击败它。非常   看到任何优势(如果有的话)都需要大量投入   可见)。这种算法是最有效的,但这并不意味着   它更快 - 也不意味着你应该使用它:set   内置方法经过优化和战斗测试c代码;他们为   代码,更易于编写,读取,理解,调试和维护。

def get_symmetric_difference(seq1, seq2):
    """
    computes the symmetric difference of unique elements of seq1 & seq2 
    as a new sorted list, without mutating the parameters.

    seq1: a sorted sequence of int
    seq2: a sorted sequence of int

    return: a new sorted list containing the symmetric difference 
            of unique elements of seq1 & seq2
    """

    if not seq1:
        symmetric_difference = seq2[:]
        return symmetric_difference
    if not seq2:
        symmetric_difference = seq1[:]
        return symmetric_difference

    symmetric_difference = []

    idx = 0
    jdx = 0  
    last_insert = None
    last_seen = None

    while idx < len(seq1) and jdx < len(seq2):
        s1 = seq1[idx]
        s2 = seq2[jdx]
        if s1 == s2:
            idx += 1
            jdx += 1
            last_seen = s1
        elif s1 < s2:
            if last_insert != s1 and last_seen != s1:
                symmetric_difference.append(s1)
                last_insert = s1
            idx += 1
        elif s2 < s1:
            if last_insert != s2 and last_seen != s2:
                symmetric_difference.append(s2)
                last_insert = s2
            jdx += 1

    if len(seq1[idx:]) > len(seq2[jdx:]):
        for elt in seq1[idx:]:
            if last_insert != elt and last_seen != elt:
                symmetric_difference.append(elt)
                last_insert = elt
                last_seen = elt
    else:
        for elt in seq2[jdx:]:
            if last_insert != elt and last_seen != elt:
                symmetric_difference.append(elt)
                last_insert = elt
                last_seen = elt

    return symmetric_difference

测试:

def test_get_symmetric_difference():

    seq1 = []
    seq2 = []
    assert get_symmetric_difference(seq1, seq2) == []

    seq1 = [1]
    seq2 = []
    assert get_symmetric_difference(seq1, seq2) == [1]

    seq1 = [1, 2, 3, 4]
    seq2 = [-2, -1, 5, 6, 7, 8]
    assert get_symmetric_difference(seq1, seq2) == [-2, -1, 1, 2, 3, 4, 5, 6, 7, 8]

    seq1 = [    -1, 1, 2, 3, 4,    6,       9,  22, 34]
    seq2 = [-2, -1,             5, 6, 7, 8, 19, 22,    43]
    assert get_symmetric_difference(seq1, seq2) == [-2, 1, 2, 3, 4, 5, 7, 8, 9, 19, 34, 43]

    seq1 = [-2, -1,             5, 6, 7, 8, 19, 22,    43]
    seq2 = [    -1, 1, 2, 3, 4,    6,       9,  22, 34]
    assert get_symmetric_difference(seq1, seq2) == [-2, 1, 2, 3, 4, 5, 7, 8, 9, 19, 34, 43]

    seq1 = [-2, -1, 0,            5,       22, 34]
    seq2 = [-2, -1,   1, 2, 3, 4,    6, 9, 22, 34]
    assert get_symmetric_difference(seq1, seq2) == [0, 1, 2, 3, 4, 5, 6, 9]

    seq1 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
    seq2 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
    assert get_symmetric_difference(seq1, seq2) == []

    seq1 = [7, 7, 7, 7, 7, 7]
    seq2 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
    assert get_symmetric_difference(seq1, seq2) == [-2, -1, 1, 2, 3, 4, 6, 7, 9, 22, 34]

    seq1 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
    seq2 = [7, 7, 7, 7, 7, 7]
    assert get_symmetric_difference(seq1, seq2) == [-2, -1, 1, 2, 3, 4, 6, 7, 9, 22, 34]

    seq1 = [-2, -1, 1, 2, 3, 4, 6, 9, 22, 34]
    seq2 = [-1, -1, 7, 7, 43, 43, 43]
    assert get_symmetric_difference(seq1, seq2) == [-2, 1, 2, 3, 4, 6, 7, 9, 22, 34, 43]

    seq1 = [34, 34, 34, 34]
    seq2 = [7, 34]
    assert get_symmetric_difference(seq1, seq2) == [7]

    seq1 = [7, 34]
    seq2 = [34, 34, 34, 34]
    assert get_symmetric_difference(seq1, seq2) == [7]

    seq1 = [7, 34]
    seq2 = [7, 7, 7, 7, 7]
    assert get_symmetric_difference(seq1, seq2) == [34]

    seq1 = [7, 7, 7, 7, 34]
    seq2 = [7, 7]
    assert get_symmetric_difference(seq1, seq2) == [34]

    print("***all tests pass***")


test_get_symmetric_difference()

输出:

***all tests pass***

答案 1 :(得分:0)

永远不要相信set要排序。在您希望返回已排序的set时,请始终在将list转换为list个对象后进行排序。我不确定我在下面的解释中观察到的行为。

转换回列表后不需要排序,因为列表已经排序了。删除额外的排序会使它更有效率。

如果保证list1list2是正int个对象的排序列表,则生成的symmetric_difference set似乎返回在Python 3.5中排序。如果list1list2包含任何否定intfloat,则需要再次对结果进行排序。

def sorted_symdiff(list1,list2):
    """ Each list is already sorted, this seems inefficient """
    s1,s2 = set(list1),set(list2)
    diff = list(s1.symmetric_difference(s2))
    return diff