寻找更多pythonic列表比较解决方案

时间:2011-07-21 00:15:27

标签: python list

好的,我有两个清单:

x = [1, 2, 3, 4]
y = [1, 1, 2, 5, 6]

我以这种方式比较它们,所以我得到以下输出:

x = [3, 4]
y = [1, 5, 6]

基本的想法是浏览每个列表并进行比较。如果他们有一个共同的元素删除该元素。但只有一个元素不是全部。如果他们没有共同的元素就离开它。两个相同的列表将变为x = [],y = []

这是我非常讨厌和非常蹩脚的解决方案。我希望其他人可以推荐更好的和/或更多的pythonic方式来做到这一点。 3个循环似乎过多...

    done = True

    while not done:
        done = False
        for x in xlist:
            for y in ylist:
                if x == y:
                    xlist.remove(x)
                    ylist.remove(y)
                    done = False
        print xlist, ylist

一如既往地感谢您花时间阅读这个问题。 XOXO

6 个答案:

答案 0 :(得分:7)

您正在寻找的数据结构可能是multiset(或“包”),如果是这样,在Python中实现它的一种好方法是使用collections.Counter

>>> from collections import Counter
>>> x = Counter([1, 2, 3, 4])
>>> y = Counter([1, 1, 2, 5, 6])
>>> x - y
Counter({3: 1, 4: 1})
>>> y - x
Counter({1: 1, 5: 1, 6: 1})

如果要将Counter个对象转换回具有多重性的列表,可以使用elements方法:

>>> list((x - y).elements())
[3, 4]
>>> list((y - x).elements())
[1, 5, 6]

答案 1 :(得分:3)

以Gareth的答案为基础:

>>> a = Counter([1, 2, 3, 4])
>>> b = Counter([1, 1, 2, 5, 6])
>>> (a - b).elements()
[3, 4]
>>> (b - a).elements()
[1, 5, 6]

基准代码:

from collections import Counter
from collections import defaultdict
import random

# short lists
#a = [1, 2, 3, 4, 7, 8, 9]
#b = [1, 1, 2, 5, 6, 8, 8, 10]

# long lists
a = []
b = []

for i in range(0, 1000):
    q = random.choice((1, 2, 3, 4))
    if q == 1:
        a.append(i)
    elif q == 2:
        b.append(i)
    elif q == 3:
        a.append(i)
        b.append(i)
    else:
        a.append(i)
        b.append(i)
        b.append(i)

# Modifies the lists in-place! Naughty! And it doesn't actually work, to boot.
def original(xlist, ylist):
    done = False

    while not done:
        done = True
        for x in xlist:
            for y in ylist:
                if x == y:
                    xlist.remove(x)
                    ylist.remove(y)
                    done = False
    return xlist, ylist # not strictly necessary, see above


def counter(xlist, ylist):
    x = Counter(xlist)
    y = Counter(ylist)
    return ((x-y).elements(), (y-x).elements())


def nasty(xlist, ylist):
    x = sum(([i]*(xlist.count(i)-ylist.count(i)) for i in set(xlist)),[])
    y = sum(([i]*(ylist.count(i)-xlist.count(i)) for i in set(ylist)),[])

    return x, y


def gnibbler(xlist, ylist):
    d = defaultdict(int)
    for i in xlist: d[i] += 1
    for i in ylist: d[i] -= 1
    return [k for k,v in d.items() for i in range(v)], [k for k,v in d.items() for i in range(-v)]

# substitute algorithm to test in the call
for x in range(0, 100000):
    original(list(a), list(b))

运行不太严格的基准[tm](短列表是原始列表,长列表是随机生成的列表,大约1000个条目长,混合匹配和重复,在原始算法的乘数中给出的时间):

    100K iterations, short lists    1K iterations, long lists
Original     1.0                           1.0
Counter      9.3                           0.06
Nasty        2.9                           1.4
Gnibbler     2.4                           0.02

注1:Counter对象的创建似乎掩盖了小列表大小的实际算法。

注2:原始和gnibbler在列表长度约为35时是相同的,高于gnibbler(和Counter)的速度更快。

答案 2 :(得分:3)

如果您不关心订单,请使用collections.Counter在一行中执行此操作:

>>> Counter(x)-Counter(y)
Counter({3: 1, 4: 1})

>>> Counter(y)-Counter(x)
Counter({1: 1, 5: 1, 6: 1})

如果你关心订单,你可以在上面的词典中迭代你的列表:

def prune(seq, toPrune):
    """Prunes elements from front of seq in O(N) time"""
    remainder = Counter(seq)-Counter(toPrune)
    R = []
    for x in reversed(seq):
        if remainder.get(x):
            remainder[x] -= 1
            R.insert(0,x)
    return R

演示:

>>> prune(x,y)
[3, 4]
>>> prune(y,x)
[1, 1, 5, 6]

答案 3 :(得分:2)

只需使用collections.defaultdict即可使用Python2.5 +

>>> x = [1, 2, 3, 4]
>>> y = [1, 1, 2, 5, 6]
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in x:
...  d[i] += 1
... 
>>> for i in y:
...  d[i] -= 1
... 
>>> [k for k,v in d.items() for i in range(v)]
[3, 4]
>>> [k for k,v in d.items() for i in range(-v)]
[1, 5, 6]

如果数字重复变大,我发现这比范围(或xrange)更好

>>> from itertools import repeat
>>> [k for k,v in d.items() for i in repeat(None, v)]

答案 4 :(得分:0)

相当讨厌:P

a = sum(([i]*(x.count(i)-y.count(i)) for i in set(x)),[])
b = sum(([i]*(y.count(i)-x.count(i)) for i in set(y)),[])

x,y = a,b

答案 5 :(得分:0)

如果您不关心重复项,这很简单:

>>> x=[1,2,3,4]
>>> y=[1,1,2,5,6]
>>> list(set(x).difference(set(y)))
[3, 4]
>>> list(set(y).difference(set(x)))
[5, 6]