在python中有效地搜索字符串列表中的字符串列表

时间:2012-02-16 18:31:07

标签: python list

我有一个字符串列表和一个字符串列表。例如:

L1=[["cat","dog","apple"],["orange","green","red"]]
L2=["cat","red"]

如果L1 [i]包含来自L2的任何项目我需要放置对(用于在图形中创建边缘) 比如,在我的例子中,我需要成对("cat","dog"),("cat,apple"),("red,orange"),("red","green")

我应该采用什么方法来提高效率。 (我的名单L1很大)

4 个答案:

答案 0 :(得分:2)

假设您的L1子列表中可能有多个“控制”项。

我会使用set()itertools.product()

来实现
from itertools import product

def generate_edges(iterable, control):
    edges = []
    control_set = set(control)
    for e in iterable:
        e_set = set(e)
        common = e_set & control_set
        to_pair = e_set - common
        edges.extend(product(to_pair, common))
    return edges

示例:

>>> L1 = [["cat","dog","apple"],
...       ["orange","green","red"],
...       ["hand","cat","red"]]
>>> L2 = ["cat","red"]
>>> generate_edges(L1, L2)
[('apple', 'cat'),
 ('dog', 'cat'),
 ('orange', 'red'),
 ('green', 'red'),
 ('hand', 'red'),
 ('hand', 'cat')]

答案 1 :(得分:1)

我建议将它们全部转换为set并使用set operations(intersection)来计算每个L1项中L2的条件。然后,您可以使用set subtraction来获取需要配对的项目列表。

edges = []
L2set = set(L2)
for L1item in L1:
    L1set = set(L1item)
    items_in_L1item = L1set & L2set
    for item in items_in_L1item:
        items_to_pair = L1set - set([item])
        edges.extend((item, i) for i in items_to_pair)

答案 2 :(得分:1)

即使L1L2很大,要使此代码最佳,请使用生成生成器的izip,而不是创建庞大的元组列表。如果您使用的是Python3,请使用zip

from itertools import izip

pairs = []
for my_list, elem in izip(L1, L2):
    if elem in my_list:
        pairs += [(elem, e) for e in my_list if e!=elem]
print pairs

代码非常易于理解,它几乎是纯英文!首先,你循环遍历每个列表及其相应的元素,然后你会询问该元素是否在列表中,如果是,则打印除了对(x,x)之外的所有对。

输出:

[('cat', 'dog'), ('cat', 'apple'), ('red', 'orange'), ('red', 'green')]

答案 3 :(得分:1)

如果L1非常大,您可能需要考虑使用bisect。它要求你先展平并排序L1。你可以这样做:

from bisect import bisect_left, bisect_right
from itertools import chain

L1=[["cat","dog","apple"],["orange","green","red","apple"]]
L2=["apple", "cat","red"]

M1 = [[i]*len(j) for i, j in enumerate(L1)]
M1 = list(chain(*M1))
L1flat = list(chain(*L1))
I = sorted(range(len(L1flat)), key=L1flat.__getitem__)
L1flat = [L1flat[i] for i in I]
M1 = [M1[i] for i in I]

for item in L2:
    s = bisect_left(L1flat, item)
    e = bisect_right(L1flat, item)
    print item, M1[s:e]

#apple [0, 1]
#cat [0]
#red [1]