查找与另一个列表对应的元素的索引

时间:2018-06-07 13:45:20

标签: python string list

我有一个字符串列表:ls = ['a','b','c']和另一个字符串,包含更大的字符串,保证包含ls中的一个且只有一个字符串:ls2 = ['1298a', 'eebbbd', 'qcqcq321']"

对于来自ls2的给定字符串,如何找到来自ls的相应字符串的索引?

我可以使用:

for s in ls:
    for ss in ls2:
        if s in ss:
            print (s,ss,ls.index(s))

a 1298a 0
b eebbbd 1
c qcqcq321 2

但它有更好的东西吗?

编辑(希望澄清):

我正在处理的实际情况有一个更大的第一个列表,一个更小的第二个:

ls  = ['apo','b','c','d25','egg','f','g']
ls2 = ['apoip21', 'oiujohuid25']

我希望得到结果0,3,因为ls2中的第1项有ls中的第1项,而ls2中的第2项有{4} {1}}

5 个答案:

答案 0 :(得分:2)

在没有关于您的数据的更多信息的情况下,您似乎无法摆脱m = len(ls)复杂性(n = len(ls2)p = max(map(len, ls2))O(m2 * n * p))。通过使用enumerate跟踪当前索引,您绝对可以从for string in ls2: for index, key in enumerate(ls): if key in string: print(key, string, index) break 减少当前循环。另外,不要忘记提前终止:

break

请注意,我交换了内部和外部循环以使ls2正常工作:您肯定要检查ls的每个元素,但只检查O(m * n * p)中的最小元素数量

以下是我在此处介绍的不同ls = ['g', 'j', 'z', 'a', 'rr', 'ttt', 'b', 'c', 'd', 'f'] ls2 = ['1298a', 'eebbb', 'qcqcq321', 'mlkjmd', 'dùmlk', 'lof', 'erreee', 'bmw', 'ottt', 'jllll', 'lla' ] def with_table(): table = {key: index for index, key in enumerate(ls)} result = {} for string in ls2: for key in ls: if key in string: result[string] = table[key] return result def with_enumerate(): result = {} for string in ls2: for index, key in enumerate(ls): if key in string: result[string] = index break return result def with_dict_comp(): return {string: index for string in ls2 for index, key in enumerate(ls) if key in string} def with_itertools(): result = {} for (index, key), string in itertools.product(enumerate(ls), ls2): if key in string: result[string] = index return result 解决方案上积累的一些时间安排。感谢@ thierry-lathuille提供的测试数据:

%timeit with_table()
4.89 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit with_enumerate()
5.27 µs ± 66.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit with_dict_comp()
6.9 µs ± 83.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit with_itertools()
17.5 ns ± 0.193 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

enumerate

事实证明,为索引创建查找表比使用flex动态计算它们要快一些。

答案 1 :(得分:0)

使用词典理解:

ls  = ['apo','b','c','d25','egg','f','g']
ls2 = ['apoip21', 'oiujohuid25']

result = { string : index for index, i in enumerate(ls) for string in ls2 if i in string }
# {'apoip21': 0, 'oiujohuid25': 3}

答案 2 :(得分:0)

您的代码时间复杂度为O(n ^ 4),您可以使用dict将其设为O(n ^ 3)。

ls = ['a','b','c']
ls2 = ['1298a', 'eebbbd', 'qcqcq321']
word_dict=dict()
for i in range(len(ls)):    #time complexity O(n)

    word_dict[ls[i]]=i
for s in ls:    #O(n)
    for ss in ls2:  #O(n)
        if s in ss: #O(n)
            print(s,ss,word_dict[s]) #O(1)

答案 3 :(得分:0)

因此,如果我将“更好”的请求解释为“以更紧凑的方式编写”,那么我建议列表中包含所有循环和条件:

>>> ls  = ['apo','b','c','d25','egg','f','g']
>>> ls2 = ['apoip21', 'oiujohuid25']

>>> [ls.index(s) for s in ls for s2 in ls2 if s in s2]
[0, 3]

但如果“更好”被理解为“不那么复杂”,那么它并没有提高复杂性......

答案 4 :(得分:0)

您可以使用以下两项改进:

ls = ['a','b','c']
ls2 = ['1298a', 'eebbbd', 'qcqcq321']

# preprocess ls to avoid calling ls.index each time:
indices = {ss:index for index, ss in enumerate(ls)}

for s in ls2:
    for ss in ls:
        if ss in s:
            print(ss, s, indices[ss])
            # as s is guaranteed to include only one of the substrings,
            # we don't have to test the other substrings once we found a match
            break


# a 1298a 0
# b eebbbd 1
# c qcqcq321 2     

一些时间:

找到匹配后突破循环总能提高速度。 由于创建索引的dict导致的开销使得它对于非常小的列表来说变得更慢,但是已经比列表更短的列表更快:

ls = ['g', 'j', 'z', 'a', 'rr', 'ttt', 'b', 'c', 'd', 'f']
ls2 = ['1298a', 'eebbb', 'qcqcq321', 'mlkjmd', 'dùmlk', 'lof', 'erreee', 'bmw', 'ottt', 'jllll', 'lla' ]

def original():
    for s in ls:
        for ss in ls2:
            if s in ss:
                out = (s,ss,ls.index(s))


def with_break():

    for s in ls2:
        for ss in ls:
            if ss in s:
                out = (ss, s, ls.index(ss))
                # as s is guaranteed to include only one of the substrings,
                # we don't have to test the other substrings once we found a match
                break


def with_break_and_dict():
    # preprocess ls to avoid calling ls.index each time:
    indices = {ss:index for index, ss in enumerate(ls)}

    for s in ls2:
        for ss in ls:
            if ss in s:
                out = (ss, s, indices[ss])
                # as s is guaranteed to include only one of the substrings,
                # we don't have to test the other substrings once we found a match
                break

计时结果:

%timeit original()
%timeit with_break()
%timeit with_break_and_dict()


# 100000 loops, best of 3: 12.8 µs per loop
# 100000 loops, best of 3: 9.5 µs per loop
# 100000 loops, best of 3: 8.49 µs per loop