算法比较两个列表并在python中获取相同的元素

时间:2012-10-23 10:13:34

标签: python

我必须列出其中包含一些共同元素的列表:

p = [('link1/d/b/c', 'target1/d/b/c'), ('link2/a/g/c', 'target2/a/g/c'), ..., ('linkn/b/b/f', 'targetn/b/b/f')]

q = [['target1/d/b/c', 'target1', 123, 334], ['targetn/b/b/f', 'targetn', 23, 64], ... ,['targetx/f/f/f', 'targetx', 999, 888]]

我试图比较它们并找到共同的元素,然后用结果做一些工作:

do_job('target1/d/b/c', 'target1', 123, 334, 'link1/d/b/c')

现在我使用简单且非常慢的alghortihm:

for item in p:
   link = item[0]
   target = item[1]
   for item2 in q:
       target2 = item2[0]
       if target2 == target:
           do_some_job(...)

我知道,我需要比较这两个列表并创建一个包含所有元素的列表,例如:

pq = [['target1/d/b/c', 'target1', 123, 334, 'link1/d/b/c'], ..., ['targetn/b/b/f', 'targetn', 23, 64, 'linkn/b/b/f']]

然后调用do_some_job(pq)而不是每当我找到相同的元素时调用它

如何获得它?

最好的问候

3 个答案:

答案 0 :(得分:5)

使用chain()展平两个列表,然后使用set()intersection()来获取常用元素。

In [78]: from itertools import chain

In [79]: p
Out[79]: 
[('link1/d/b/c', 'target1/d/b/c'),
 ('link2/a/g/c', 'target2/a/g/c'),
 ('linkn/b/b/f', 'targetn/b/b/f')]

In [80]: q
Out[80]: 
[['target1/d/b/c', 'target1', 123, 334],
 ['targetn/b/b/f', 'targetn', 23, 64],
 ['targetx/f/f/f', 'targetx', 999, 888]]

In [81]: set(chain(*p)).intersection(set(chain(*q)))
Out[81]: set(['target1/d/b/c', 'targetn/b/b/f'])

或使用列表理解与短路:

In [86]: [j for i in p for j in i if j in (z for y in q for z in y)]
Out[86]: ['target1/d/b/c', 'targetn/b/b/f']

或使用any()

In [87]: [j for i in p for j in i if any (j==z for y in q for z in y)]
Out[87]: ['target1/d/b/c', 'targetn/b/b/f']

<强> timeit

In [93]: %timeit set(chain(*p)).intersection(set(chain(*q)))
100000 loops, best of 3: 7.38 us per loop                     ##  winner

In [94]: %timeit [j for i in p for j in i if j in (z for y in q for z in y)]
10000 loops, best of 3: 24.9 us per loop

In [95]: %timeit [j for i in p for j in i if any (j==z for y in q for z in y)]
10000 loops, best of 3: 27.4 us per loop

In [97]: %timeit [x for x in chain(*p) if x in chain(*q)]
10000 loops, best of 3: 12.6 us per loop

答案 1 :(得分:1)

您应该使用字典:

target_to_link = dict((v,k) for (k,v) in p)
for item in q:
    args = item + [target_to_link[item[0]]
    do_some_job(*args)

target_to_link字典为您提供目标的相应链接。只需确保您没有多个目标共享同一个链接...

for循环中,我们只创建一个临时参数列表args,将item(例如['target1/d/b/c', 'target1', 123, 334])与相应的链接结合起来,我们使用function(*args)语法...


如果您需要在p上循环,则可以构建类似

的字典
target_to_args = dict((k[0],k[1:]) for k in q)

然后执行类似

的操作
for (link, target) in p:
    args = [target] + target_to_args[target] + [link]
    do_some_job(*args)

答案 2 :(得分:0)

使用chain的列表理解应该有效:

[x for x in chain(*p) if x in chain(*q)]