好的,我有两个清单:
x = [1, 2, 3, 4]
y = [1, 1, 2, 5, 6]
我以这种方式比较它们,所以我得到以下输出:
x = [3, 4]
y = [1, 5, 6]
基本的想法是浏览每个列表并进行比较。如果他们有一个共同的元素删除该元素。但只有一个元素不是全部。如果他们没有共同的元素就离开它。两个相同的列表将变为x = [],y = []
这是我非常讨厌和非常蹩脚的解决方案。我希望其他人可以推荐更好的和/或更多的pythonic方式来做到这一点。 3个循环似乎过多...
done = True
while not done:
done = False
for x in xlist:
for y in ylist:
if x == y:
xlist.remove(x)
ylist.remove(y)
done = False
print xlist, ylist
一如既往地感谢您花时间阅读这个问题。 XOXO
答案 0 :(得分:7)
您正在寻找的数据结构可能是multiset(或“包”),如果是这样,在Python中实现它的一种好方法是使用collections.Counter
:
>>> from collections import Counter
>>> x = Counter([1, 2, 3, 4])
>>> y = Counter([1, 1, 2, 5, 6])
>>> x - y
Counter({3: 1, 4: 1})
>>> y - x
Counter({1: 1, 5: 1, 6: 1})
如果要将Counter
个对象转换回具有多重性的列表,可以使用elements
方法:
>>> list((x - y).elements())
[3, 4]
>>> list((y - x).elements())
[1, 5, 6]
答案 1 :(得分:3)
以Gareth的答案为基础:
>>> a = Counter([1, 2, 3, 4])
>>> b = Counter([1, 1, 2, 5, 6])
>>> (a - b).elements()
[3, 4]
>>> (b - a).elements()
[1, 5, 6]
基准代码:
from collections import Counter
from collections import defaultdict
import random
# short lists
#a = [1, 2, 3, 4, 7, 8, 9]
#b = [1, 1, 2, 5, 6, 8, 8, 10]
# long lists
a = []
b = []
for i in range(0, 1000):
q = random.choice((1, 2, 3, 4))
if q == 1:
a.append(i)
elif q == 2:
b.append(i)
elif q == 3:
a.append(i)
b.append(i)
else:
a.append(i)
b.append(i)
b.append(i)
# Modifies the lists in-place! Naughty! And it doesn't actually work, to boot.
def original(xlist, ylist):
done = False
while not done:
done = True
for x in xlist:
for y in ylist:
if x == y:
xlist.remove(x)
ylist.remove(y)
done = False
return xlist, ylist # not strictly necessary, see above
def counter(xlist, ylist):
x = Counter(xlist)
y = Counter(ylist)
return ((x-y).elements(), (y-x).elements())
def nasty(xlist, ylist):
x = sum(([i]*(xlist.count(i)-ylist.count(i)) for i in set(xlist)),[])
y = sum(([i]*(ylist.count(i)-xlist.count(i)) for i in set(ylist)),[])
return x, y
def gnibbler(xlist, ylist):
d = defaultdict(int)
for i in xlist: d[i] += 1
for i in ylist: d[i] -= 1
return [k for k,v in d.items() for i in range(v)], [k for k,v in d.items() for i in range(-v)]
# substitute algorithm to test in the call
for x in range(0, 100000):
original(list(a), list(b))
运行不太严格的基准[tm](短列表是原始列表,长列表是随机生成的列表,大约1000个条目长,混合匹配和重复,在原始算法的乘数中给出的时间):
100K iterations, short lists 1K iterations, long lists
Original 1.0 1.0
Counter 9.3 0.06
Nasty 2.9 1.4
Gnibbler 2.4 0.02
注1:Counter对象的创建似乎掩盖了小列表大小的实际算法。
注2:原始和gnibbler在列表长度约为35时是相同的,高于gnibbler(和Counter)的速度更快。
答案 2 :(得分:3)
如果您不关心订单,请使用collections.Counter
在一行中执行此操作:
>>> Counter(x)-Counter(y)
Counter({3: 1, 4: 1})
>>> Counter(y)-Counter(x)
Counter({1: 1, 5: 1, 6: 1})
如果你关心订单,你可以在上面的词典中迭代你的列表:
def prune(seq, toPrune):
"""Prunes elements from front of seq in O(N) time"""
remainder = Counter(seq)-Counter(toPrune)
R = []
for x in reversed(seq):
if remainder.get(x):
remainder[x] -= 1
R.insert(0,x)
return R
演示:
>>> prune(x,y)
[3, 4]
>>> prune(y,x)
[1, 1, 5, 6]
答案 3 :(得分:2)
只需使用collections.defaultdict
即可使用Python2.5 +
>>> x = [1, 2, 3, 4]
>>> y = [1, 1, 2, 5, 6]
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in x:
... d[i] += 1
...
>>> for i in y:
... d[i] -= 1
...
>>> [k for k,v in d.items() for i in range(v)]
[3, 4]
>>> [k for k,v in d.items() for i in range(-v)]
[1, 5, 6]
如果数字重复变大,我发现这比范围(或xrange)更好
>>> from itertools import repeat
>>> [k for k,v in d.items() for i in repeat(None, v)]
答案 4 :(得分:0)
相当讨厌:P
a = sum(([i]*(x.count(i)-y.count(i)) for i in set(x)),[])
b = sum(([i]*(y.count(i)-x.count(i)) for i in set(y)),[])
x,y = a,b
答案 5 :(得分:0)
如果您不关心重复项,这很简单:
>>> x=[1,2,3,4]
>>> y=[1,1,2,5,6]
>>> list(set(x).difference(set(y)))
[3, 4]
>>> list(set(y).difference(set(x)))
[5, 6]