Python中的两个列表比较

时间:2013-07-18 07:50:37

标签: python

我有两个清单:

list1=[[['B', 10], 1], [['C', 15], 1], [['F', 30], 1]]
list2=[[['G', 20], 2], [['D', 25], 1]]

[['B', 10], 1]]中的list1[['D', 25], 1]]中的list2匹配,因为它们具有相同的第二个元素([1][1],第一个匹配)

我想说['B', 10]['D', 25]已修改,[['C', 15], 1][['F', 30], 1]已从list1中删除[['G', 20], 2]已添加到{ {1}}。

有人可以帮助我吗?我已经尝试过列表的集合和迭代,并且比较它们都不起作用。

注意:这不是作业,只是我想知道列表是如何工作的。

1 个答案:

答案 0 :(得分:0)

您可以尝试这样的事情:

list1=[[['B', 10], 1], [['C', 15], 1], [['F', 30], 1]]
list2=[[['G', 20], 2], [['D', 25], 1]]

list2_iter = iter(list2)
for item1 in list1:
    in_list2 = False
    for item2 in list2_iter:
        if item1[1] == item2[1]:
            print "Match" if item1 == item2 else "Modified", item1, item2
            in_list2 = True
            break
        else:
            print "Inserted", item2
    if not in_list2:
        print "Deleted", item1

请注意list2的内部循环使用迭代器,因此它不是每次从list2的开头开始,而是从它在外部的上一次迭代中停止的最后一个元素开始环。其他一切都很简单。

输出:

Inserted [['G', 20], 2]
Modified [['B', 10], 1] [['D', 25], 1]
Deleted [['C', 15], 1]
Deleted [['F', 30], 1]

请注意,这并不总能找到最佳匹配,因为它只会通过list1list2进行一次传递。有关更完整的算法,请查看DiffLongest Common Subsequence


更新:我刚才意识到这个问题实际上和找到两个字符串的minimum edit distance几乎一样,为此我碰巧有一些代码。经过一些概括:

import operator

def diff(s1, s2, match=operator.eq, neutral="*", subst=2):
    """s1, s2: the two strings or lists to match
    match: function to determine whether two elements match up
    neutral: 'neutral' element for padding
    subst: substitution costs
    """
    s1, s2 = neutral + s1, neutral + s2

    # calculate edit distance / sequence match with DP
    A = [[0] * len(s2) for i in range(len(s1))]
    for i in range(len(s1)):
        for k in range(len(s2)):
            if min(i, k) == 0:
                A[i][k] = max(i, k)
            else:
                diag = 0 if match(s1[i], s2[k]) else subst
                A[i][k] = min(A[i-1][k-1] + diag,
                              A[i  ][k-1] + 1,
                              A[i-1][k  ] + 1)

    # reconstruct path
    path, i, k = [], len(s1)-1, len(s2)-1
    while i or k:
        if A[i][k] == A[i-1][k] + 1:
            path, i = [-1] + path, i-1
        elif A[i][k] == A[i][k-1] + 1:
            path, k = [1] + path, k-1
        else:
            path, i, k = [0] + path, i-1, k-1

    return A[len(s1)-1][len(s2)-1], path


def print_match(list1, list2, path):
    i1, i2 = iter(list1), iter(list2)
    for p in path:
        if p == -1: print "DEL %20r"      %  next(i1)
        if p ==  0: print "EQ  %20r %20r" % (next(i1), next(i2))
        if p == +1: print "INS %41r"      %            next(i2)

# with strings
word1, word2 = "INTENTION", "EXECUTION"
x, path = diff(word1, word2)
print_match(word1, word2, path)

# with your lists of lists
list1 = [[['B', 10], 1], [['C', 15], 1], [['F', 30], 1]]
list2 = [[['G', 20], 2], [['D', 25], 1]]
x, path = diff(list1, list2, match=lambda x, y: x[1] == y[1], neutral=[[None, -1]])
print_match(list1, list2, path)

请注意,生成的匹配仍然与您的预期不同,因为通常有许多匹配元素的最佳方法,但它应该是最佳的。