Question

我有一个字符串列表：ls = ['a','b','c']和另一个字符串，包含更大的字符串，保证包含ls中的一个且只有一个字符串：ls2 = ['1298a', 'eebbbd', 'qcqcq321']"。

对于来自ls2的给定字符串，如何找到来自ls的相应字符串的索引？

我可以使用：

for s in ls:
    for ss in ls2:
        if s in ss:
            print (s,ss,ls.index(s))

a 1298a 0
b eebbbd 1
c qcqcq321 2

但它有更好的东西吗？

编辑（希望澄清）：

我正在处理的实际情况有一个更大的第一个列表，一个更小的第二个：

ls  = ['apo','b','c','d25','egg','f','g']
ls2 = ['apoip21', 'oiujohuid25']

我希望得到结果0,3，因为ls2中的第1项有ls中的第1项，而ls2中的第2项有{4} {1}}

Answer 1

在没有关于您的数据的更多信息的情况下，您似乎无法摆脱m = len(ls)复杂性（n = len(ls2)，p = max(map(len, ls2))，O(m² * n * p)）。通过使用enumerate跟踪当前索引，您绝对可以从for string in ls2: for index, key in enumerate(ls): if key in string: print(key, string, index) break减少当前循环。另外，不要忘记提前终止：

break

请注意，我交换了内部和外部循环以使ls2正常工作：您肯定要检查ls的每个元素，但只检查O(m * n * p)中的最小元素数量

以下是我在此处介绍的不同ls = ['g', 'j', 'z', 'a', 'rr', 'ttt', 'b', 'c', 'd', 'f'] ls2 = ['1298a', 'eebbb', 'qcqcq321', 'mlkjmd', 'dùmlk', 'lof', 'erreee', 'bmw', 'ottt', 'jllll', 'lla' ] def with_table(): table = {key: index for index, key in enumerate(ls)} result = {} for string in ls2: for key in ls: if key in string: result[string] = table[key] return result def with_enumerate(): result = {} for string in ls2: for index, key in enumerate(ls): if key in string: result[string] = index break return result def with_dict_comp(): return {string: index for string in ls2 for index, key in enumerate(ls) if key in string} def with_itertools(): result = {} for (index, key), string in itertools.product(enumerate(ls), ls2): if key in string: result[string] = index return result解决方案上积累的一些时间安排。感谢@ thierry-lathuille提供的测试数据：

%timeit with_table()
4.89 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit with_enumerate()
5.27 µs ± 66.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit with_dict_comp()
6.9 µs ± 83.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit with_itertools()
17.5 ns ± 0.193 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

enumerate

事实证明，为索引创建查找表比使用flex动态计算它们要快一些。

Answer 2

使用词典理解：

ls  = ['apo','b','c','d25','egg','f','g']
ls2 = ['apoip21', 'oiujohuid25']

result = { string : index for index, i in enumerate(ls) for string in ls2 if i in string }
# {'apoip21': 0, 'oiujohuid25': 3}

Answer 3

您的代码时间复杂度为O（n ^ 4），您可以使用dict将其设为O（n ^ 3）。

ls = ['a','b','c']
ls2 = ['1298a', 'eebbbd', 'qcqcq321']
word_dict=dict()
for i in range(len(ls)):    #time complexity O(n)

    word_dict[ls[i]]=i
for s in ls:    #O(n)
    for ss in ls2:  #O(n)
        if s in ss: #O(n)
            print(s,ss,word_dict[s]) #O(1)

Answer 4

因此，如果我将“更好”的请求解释为“以更紧凑的方式编写”，那么我建议列表中包含所有循环和条件：

>>> ls  = ['apo','b','c','d25','egg','f','g']
>>> ls2 = ['apoip21', 'oiujohuid25']

>>> [ls.index(s) for s in ls for s2 in ls2 if s in s2]
[0, 3]

但如果“更好”被理解为“不那么复杂”，那么它并没有提高复杂性......

Answer 5

您可以使用以下两项改进：

ls = ['a','b','c']
ls2 = ['1298a', 'eebbbd', 'qcqcq321']

# preprocess ls to avoid calling ls.index each time:
indices = {ss:index for index, ss in enumerate(ls)}

for s in ls2:
    for ss in ls:
        if ss in s:
            print(ss, s, indices[ss])
            # as s is guaranteed to include only one of the substrings,
            # we don't have to test the other substrings once we found a match
            break


# a 1298a 0
# b eebbbd 1
# c qcqcq321 2

一些时间：

找到匹配后突破循环总能提高速度。由于创建索引的dict导致的开销使得它对于非常小的列表来说变得更慢，但是已经比列表更短的列表更快：

ls = ['g', 'j', 'z', 'a', 'rr', 'ttt', 'b', 'c', 'd', 'f']
ls2 = ['1298a', 'eebbb', 'qcqcq321', 'mlkjmd', 'dùmlk', 'lof', 'erreee', 'bmw', 'ottt', 'jllll', 'lla' ]

def original():
    for s in ls:
        for ss in ls2:
            if s in ss:
                out = (s,ss,ls.index(s))


def with_break():

    for s in ls2:
        for ss in ls:
            if ss in s:
                out = (ss, s, ls.index(ss))
                # as s is guaranteed to include only one of the substrings,
                # we don't have to test the other substrings once we found a match
                break


def with_break_and_dict():
    # preprocess ls to avoid calling ls.index each time:
    indices = {ss:index for index, ss in enumerate(ls)}

    for s in ls2:
        for ss in ls:
            if ss in s:
                out = (ss, s, indices[ss])
                # as s is guaranteed to include only one of the substrings,
                # we don't have to test the other substrings once we found a match
                break

计时结果：

%timeit original()
%timeit with_break()
%timeit with_break_and_dict()


# 100000 loops, best of 3: 12.8 µs per loop
# 100000 loops, best of 3: 9.5 µs per loop
# 100000 loops, best of 3: 8.49 µs per loop

查找与另一个列表对应的元素的索引

5 个答案: