两个列表的交集,包括重复?

时间:2016-06-05 17:58:58

标签: python python-2.7

>>> a = [1,1,1,2,3,4,4]
>>> b = [1,1,2,3,3,3,4]

[1,1,2,3,4]

请注意这不是同一个问题: Python intersection of two lists keeping duplicates 因为即使列表a中有三个1,列表b中也只有两个,因此结果应该只有两个。

6 个答案:

答案 0 :(得分:14)

您可以使用collections.Counter,这将为您提供每个元素在列表中找到的最低计数。

from collections import Counter

c = list((Counter(a) & Counter(b)).elements())

<强>输出

[1, 1, 2, 3, 4]

答案 1 :(得分:3)

简单,无需其他导入,易于调试:)

缺点:列表b的值已更改。如果不想更改b,请处理b的副本。

c = list()
for x in a:
    if x in b:
        b.remove(x)
        c.append(x)

答案 2 :(得分:1)

这样做:

def scrape_world():
    url = 'http://www.example.org'
    html = requests.get(url, headers=headers)
    soup = BeautifulSoup(html.text, 'html5lib')
    titles = soup.find_all('section', 'box')

    cleaned_titles = [title for title in titles if title is not None]

    entries = [{'href': url + box.a.get('href'),
                'src': box.img.get('src'),
                'text': box.strong.a.text,
                } for box in cleaned_titles]
    return entries

给出:

from itertools import chain
list(chain.from_iterable([(val,)*min(a.count(val), b.count(val)) for val in (set(a) & set(b))]))

答案 3 :(得分:1)

这也应该有效。

a = [1, 1, 1, 2, 3, 4, 4]
b = [1, 1, 2, 3, 3, 3, 4]
c = []
i, j = 0, 0
while i < len(a) and j < len(b):
    if a[i] == b[j]:
        c.append(a[i])
        i += 1
        j += 1
    elif a[i] > b[j]:
        j += 1
    else:
        i += 1

print(c) # [1, 1, 2, 3, 4]

答案 4 :(得分:1)

使用Counter发布的已接受的解决方案很简单,但我认为使用字典的这种方法也可以工作并且可以更快 - 即使在没有订购的列表上(该要求并未真正提及,但至少有一个其他解决方案假设是这种情况)。

a = [1, 1, 1, 2, 3, 4, 4]
b = [1, 1, 2, 3, 3, 3, 4]

def intersect(nums1, nums2):
    match = {}
    for x in a:
        if x in match:
            match[x] += 1
        else:
            match[x] = 1

    i = []
    for x in b:
        if x in match:
            c.append(x)
            match[x] -= 1
            if match[x] == 0:
                del match[x]

    return i

def intersect2(nums1, nums2):
    return list((Counter(nums1) & Counter(nums2)).elements())

timeit intersect(a,b)
100000 loops, best of 3: 3.8 µs per loop

timeit intersect2(a,b)
The slowest run took 4.90 times longer than the fastest. This could mean 
that an intermediate result is being cached.
10000 loops, best of 3: 20.4 µs per loop

我测试了大小为1000和10000的随机整数列表,并且它也更快。

a = [random.randint(0,100) for r in xrange(10000)]
b = [random.randint(0,100) for r in xrange(1000)]

timeit intersect(a,b)
100 loops, best of 3: 2.35 ms per loop

timeit intersect2(a,b)
100 loops, best of 3: 4.2 ms per loop

更大的列表会有更常见的元素

a = [random.randint(0,10) for r in xrange(10000)]
b = [random.randint(0,10) for r in xrange(1000)]

timeit intersect(a,b)
100 loops, best of 3: 2.07 ms per loop

timeit intersect2(a,b)
100 loops, best of 3: 3.41 ms per loop

答案 5 :(得分:0)

这也应该有效:

def list_intersect(lisA, lisB):
    """ Finds the intersection of 2 lists including common duplicates"""

    Iset = set(lisA).intersection(set(lisB))
    Ilis = []
    for i in Iset:
        num = min(lisA.count(i), lisB.count(i))
        for j in range(num):
            Ilis.append(i)
    return Ilis