两个列表中的第一个共同元素

时间:2013-04-20 09:13:32

标签: python

x = [8,2,3,4,5]
y = [6,3,7,2,1]

如何以简洁优雅的方式找出两个列表中的第一个公共元素(在本例中为“2”)?任何列表都可以是空的,或者没有共同的元素 - 在这种情况下,无可以。

我需要这个向其中的新手展示python,所以越简单越好。

UPD:顺序对我的目的并不重要,但我们假设我正在寻找x中也出现在y中的第一个元素。

10 个答案:

答案 0 :(得分:9)

这应该是直接的,几乎和它一样有效(更有效的解决方案检查Ashwini Chaudharys answer和最有效的检查jamylaks answer和评论):

result = None
# Go trough one array
for i in x:

    # The element repeats in the other list...
    if i in y:

        # Store the result and break the loop
        result = i
        break

或者更优雅的事件是将相同的功能封装到函数 using PEP 8 like coding style conventions

def get_first_common_element(x,y):
    ''' Fetches first element from x that is common for both lists
        or return None if no such an element is found.
    '''
    for i in x:
        if i in y:
            return i

    # In case no common element found, you could trigger Exception
    # Or if no common element is _valid_ and common state of your application
    # you could simply return None and test return value
    # raise Exception('No common element found')
    return None

如果您想要所有常见元素,您可以这样做:

>>> [i for i in x if i in y]
[1, 2, 3]

答案 1 :(得分:8)

排序不是最快的方法,这可以在O(N)时间内使用set(哈希映射)完成。

>>> x = [8,2,3,4,5]
>>> y = [6,3,7,2,1]
>>> set_y = set(y)
>>> next((a for a in x if a in set_y), None)
2

或者:

next(ifilter(set(y).__contains__, x), None)

这就是它的作用:

>>> def foo(x, y):
        seen = set(y)
        for item in x:
            if item in seen:
                return item
        else:
            return None


>>> foo(x, y)
2

为了显示不同方法之间的时差(天真方法,二元搜索一组),这里有一些时间。我不得不这样做,以反驳那些认为二进制搜索速度更快的人:...

from itertools import ifilter
from bisect import bisect_left

a = [1, 2, 3, 9, 1, 1] * 100000
b = [44, 11, 23, 9, 10, 99] * 10000

c = [1, 7, 2, 4, 1, 9, 9, 2] * 1000000 # repeats early
d = [7, 6, 11, 13, 19, 10, 19] * 1000000

e = range(50000) 
f = range(40000, 90000) # repeats in the middle

g = [1] * 10000000 # no repeats at all
h = [2] * 10000000

from random import randrange
i = [randrange(10000000) for _ in xrange(5000000)] # some randoms
j = [randrange(10000000) for _ in xrange(5000000)]

def common_set(x, y, ifilter=ifilter, set=set, next=next):
    return next(ifilter(set(y).__contains__, x), None)
    pass

def common_b_sort(x, y, bisect=bisect_left, sorted=sorted, min=min, len=len):
    sorted_y = sorted(y)
    for a in x:
        if a == sorted_y[min(bisect_left(sorted_y, a),len(sorted_y)-1)]:
            return a
    else:
        return None

def common_naive(x, y):
    for a in x:
        for b in y:
            if a == b: return a
    else:
        return None

from timeit import timeit
from itertools import repeat
import threading, thread

print 'running tests - time limit of 20 seconds'

for x, y in [('a', 'b'), ('c', 'd'), ('e', 'f'), ('g', 'h'), ('i', 'j')]:
    for func in ('common_set', 'common_b_sort', 'common_naive'):        
        try:
            timer = threading.Timer(20, thread.interrupt_main)   # 20 second time limit
            timer.start()
            res = timeit(stmt="print '[', {0}({1}, {2}), ".format(func, x, y),
                         setup='from __main__ import common_set, common_b_sort, common_naive, {0}, {1}'.format(x, y),
                         number=1)
        except:
            res = "Too long!!"
        finally:
            print '] Function: {0}, {1}, {2}. Time: {3}'.format(func, x, y, res)
            timer.cancel()

测试数据是:

a = [1, 2, 3, 9, 1, 1] * 100000
b = [44, 11, 23, 9, 10, 99] * 10000

c = [1, 7, 2, 4, 1, 9, 9, 2] * 1000000 # repeats early
d = [7, 6, 11, 13, 19, 10, 19] * 1000000

e = range(50000) 
f = range(40000, 90000) # repeats in the middle

g = [1] * 10000000 # no repeats at all
h = [2] * 10000000

from random import randrange
i = [randrange(10000000) for _ in xrange(5000000)] # some randoms
j = [randrange(10000000) for _ in xrange(5000000)]

结果:

running tests - time limit of 20 seconds
[ 9 ] Function: common_set, a, b. Time: 0.00569520707241
[ 9 ] Function: common_b_sort, a, b. Time: 0.0182240340602
[ 9 ] Function: common_naive, a, b. Time: 0.00978832505249
[ 7 ] Function: common_set, c, d. Time: 0.249175872911
[ 7 ] Function: common_b_sort, c, d. Time: 1.86735751332
[ 7 ] Function: common_naive, c, d. Time: 0.264309220865
[ 40000 ] Function: common_set, e, f. Time: 0.00966861710078
[ 40000 ] Function: common_b_sort, e, f. Time: 0.0505980508696
[ ] Function: common_naive, e, f. Time: Too long!!
[ None ] Function: common_set, g, h. Time: 1.11300018578
[ None ] Function: common_b_sort, g, h. Time: 14.9472068377
[ ] Function: common_naive, g, h. Time: Too long!!
[ 5411743 ] Function: common_set, i, j. Time: 1.88894859542
[ 5411743 ] Function: common_b_sort, i, j. Time: 6.28617268396
[ 5411743 ] Function: common_naive, i, j. Time: 1.11231867458

这让您了解它如何扩展到更大的输入,O(N)对O(N log N)对O(N ^ 2)

答案 2 :(得分:6)

一个班轮:

x = [8,2,3,4,5]
y = [6,3,7,2,1]

first = next((a for a in x if a in y), None)

或更有效率:

set_y = set(y)
first = next((a for a in x if a in set_y), None)

或者更有效但仍然在一行(不要这样做):

first = next((lambda set_y: a for a in x if a in set_y)(set(y)), None)

答案 3 :(得分:3)

for循环与in一起使用会导致O(N^2)复杂度,但您可以在此处对y进行排序,并使用二进制搜索将时间复杂度提高到{{} 1}}。

O(NlogN)

输出: def binary_search(lis,num): low=0 high=len(lis)-1 ret=-1 #return -1 if item is not found while low<=high: mid=(low+high)//2 if num<lis[mid]: high=mid-1 elif num>lis[mid]: low=mid+1 else: ret=mid break return ret x = [8,2,3,4,5] y = [6,3,7,2,1] y.sort() for z in x: ind=binary_search(y,z) if ind!=-1 print z break

使用2模块执行与上述相同的操作:

bisect

答案 4 :(得分:3)

我认为你想教这个人Python,而不仅仅是编程。因此,我毫不犹豫地使用zip而不是丑陋的循环变量;它是Python中非常有用的部分,不难解释。

def first_common(x, y):
    common = set(x) & set(y)
    for current_x, current_y in zip(x, y):
        if current_x in common:
            return current_x
        elif current_y in common:
            return current_y

print first_common([8,2,3,4,5], [6,3,7,2,1])

如果您真的不想使用zip,请按以下步骤操作:

def first_common2(x, y):
    common = set(x) & set(y)
    for i in xrange(min(len(x), len(y))):
        if x[i] in common:
            return x[i]
        elif y[i] in common:
            return y[i]

对于那些感兴趣的人,这就是它如何扩展到任意数量的序列:

def first_common3(*seqs):
    common = set.intersection(*[set(seq) for seq in seqs])
    for current_elements in zip(*seqs):
        for element in current_elements:
            if element in common:
                return element

最后,请注意,与其他一些解决方案相比,如果第一个公共元素首先出现在第二个列表中,这也可以。

我刚刚注意到您的更新,这使得解决方案变得更加简单:

def first_common4(x, y):
    ys = set(y) # We don't want this to be recreated for each element in x
    for element in x:
        if element in ys:
            return element

以上可以说比生成器表达式更具可读性。

太糟糕了,没有内置的有序集。它本可以提供更优雅的解决方案。

答案 5 :(得分:1)

使用for循环似乎最容易向新人解释。

for number1 in x:
    for number2 in y:
        if number1 == number2:
            print number1, number2
            print x.index(number1), y.index(number2)
            exit(0)
print "No common numbers found."

NB没有经过测试,只是出于我的想法。

答案 6 :(得分:1)

这个使用套装。它返回第一个公共元素,如果没有公共元素则返回None。

def findcommon(x,y):
    common = None
    for i in range(0,max(len(x),len(y))):
        common = set(x[0:i]).intersection(set(y[0:i]))
        if common: break
    return list(common)[0] if common else None

答案 7 :(得分:1)

def first_common_element(x,y):
    common = set(x).intersection(set(y))
    if common:
        return x[min([x.index(i)for i in common])]

答案 8 :(得分:1)

只是为了好玩(可能效率不高),另一个使用itertools的版本:

from itertools import dropwhile, product
from operator import __ne__

def accept_pair(f):
    "Make a version of f that takes a pair instead of 2 arguments."
    def accepting_pair(pair):
        return f(*pair)
    return accepting_pair

def get_first_common(x, y):
    try:
        # I think this *_ unpacking syntax works only in Python 3
        ((first_common, _), *_) = dropwhile(
            accept_pair(__ne__),
            product(x, y))
    except ValueError:
        return None
    return first_common

x = [8, 2, 3, 4, 5]
y = [6, 3, 7, 2, 1]
print(get_first_common(x, y))  # 2
y = [6, 7, 1]
print(get_first_common(x, y))  # None

使用lambda pair: pair[0] != pair[1]代替accept_pair(__ne__)更简单,但不那么有趣。

答案 9 :(得分:0)

使用set - 这是任意数量列表的通用解决方案:

def first_common(*lsts):
    common = reduce(lambda c, l: c & set(l), lsts[1:], set(lsts[0]))
    if not common:
        return None
    firsts = [min(lst.index(el) for el in common) for lst in lsts]
    index_in_list = min(firsts)
    trgt_lst_index = firsts.index(index_in_list)
    return lsts[trgt_lst_index][index_in_list]

事后的想法 - 不是一个有效的解决方案,这个减少了多余的开销

def first_common(*lsts):
    common = reduce(lambda c, l: c & set(l), lsts[1:], set(lsts[0]))
    if not common:
        return None
    for lsts_slice in itertools.izip_longest(*lsts):
        slice_intersection = common.intersection(lsts_slice)
        if slice_intersection:
            return slice_intersection.pop()