两个列表之间的Python差异,可能具有重复值

时间:2017-06-13 18:27:23

标签: python list

我有两个列表,每个列表都有非唯一的数字,这意味着它们可以多次具有相同的值。

我需要找到两者之间的差异,考虑到相同的值可能出现多次的事实(所以我不能区分每个的集合)。因此,我需要检查第一个列表中的值是否比第二个列出现的次数多。

列表是:

l1 = [1, 2, 5, 3, 3, 4, 9, 8, 2]
l2 = [1, 1, 3, 2, 4, 8, 9]

# Sorted and justified
l1 = [1,    2, 2, 3, 3, 4, 5, 8, 9]
l2 = [1, 1, 2,    3,    4,    8, 9]

list元素可以是string或int或float。 所以结果列表应该是:

difference(l1, l2) == [3, 5, 2] 
# There is an extra 2 and 3 in l1 that is not in l2, and a 5 in l1 but not l2. 

difference(l2, l1) == [1]
# The extra 1 is the only value in l2 but not in l1.

我尝试了列表理解[x for x in l1 if x not in l2]这不起作用,因为它没有考虑两者中的重复值。

3 个答案:

答案 0 :(得分:4)

如果订单重要,您可以使用Counter(请参阅标准库的collections模块):

from collections import Counter

l1 = [1,2,5,3,3,4,9,8,2]
l2 = [1,1,3,2,4,8,9]

c1 = Counter(l1) # Counter({2: 2, 3: 2, 1: 1, 5: 1, 4: 1, 9: 1, 8: 1})
c2 = Counter(l2) # Counter({1: 2, 3: 1, 2: 1, 4: 1, 8: 1, 9: 1})

diff1 = list((c1-c2).keys()) # [2, 5, 3]
diff2 = list((c2-c1).keys()) # [1]

这是相当普遍的,也适用于字符串:

...
l1 = ['foo', 'foo', 'bar']
l2 = ['foo', 'bar', 'bar', 'baz']
...
# diff1 == ['foo']
# diff2 == ['bar', 'baz']

答案 1 :(得分:2)

我觉得很多人会来这里寻求多方差异(例如:[1, 1, 1, 2, 2, 2, 3, 3] - [1, 2, 2] == [1, 1, 2, 3, 3]),所以我也会在这里发布答案:

import collections

def multiset_difference(a, b):
    """Compute a - b of two multisets a and b"""
    a = collections.Counter(a)
    b = collections.Counter(b)

    difference = a - b
    return difference  # Remove this line if you want it as a list

    as_list = []
    for item, count in difference.items():
        as_list.extend([item] * count)
    return as_list

def ordered_multiset_difference(a, b):
    """As above, but preserves order and is O(ab) worst case"""
    difference = list(a)
    depleted = set()  # Values that aren't in difference to prevent searching the list again
    for i in b:
        if i not in depleted:
            try:
                difference.remove(i)
            except ValueError:
                depleted.add(i)
    return difference

答案 2 :(得分:0)

使用Counter可能是更好的选择,但要自己动手:

def diff(a, b):
    result = []
    cpy = b[:]
    for ele in a:
        if ele in cpy:
            cpy.remove(ele)
        else:
            result.append(ele)
    return result

或作为一个滥用的单行:

def diff(a, b):
    return [ele for ele in a if ele not in b or b.remove(ele)]

单个衬垫会在构建差异的过程中销毁b,因此您可能需要传递副本:diff(l1, l2[:]),或使用:

def diff(a, b):
    cpy = b[:]
    return [ele for ele in a if ele not in cpy or cpy.remove(ele)]