我有一个字符串列表:ls = ['a','b','c']
和另一个字符串,包含更大的字符串,保证包含ls
中的一个且只有一个字符串:ls2 = ['1298a', 'eebbbd', 'qcqcq321']"
。
对于来自ls2
的给定字符串,如何找到来自ls
的相应字符串的索引?
我可以使用:
for s in ls:
for ss in ls2:
if s in ss:
print (s,ss,ls.index(s))
a 1298a 0
b eebbbd 1
c qcqcq321 2
但它有更好的东西吗?
编辑(希望澄清):
我正在处理的实际情况有一个更大的第一个列表,一个更小的第二个:
ls = ['apo','b','c','d25','egg','f','g']
ls2 = ['apoip21', 'oiujohuid25']
我希望得到结果0,3
,因为ls2
中的第1项有ls
中的第1项,而ls2
中的第2项有{4} {1}}
答案 0 :(得分:2)
在没有关于您的数据的更多信息的情况下,您似乎无法摆脱m = len(ls)
复杂性(n = len(ls2)
,p = max(map(len, ls2))
,O(m2 * n * p)
)。通过使用enumerate
跟踪当前索引,您绝对可以从for string in ls2:
for index, key in enumerate(ls):
if key in string:
print(key, string, index)
break
减少当前循环。另外,不要忘记提前终止:
break
请注意,我交换了内部和外部循环以使ls2
正常工作:您肯定要检查ls
的每个元素,但只检查O(m * n * p)
中的最小元素数量
以下是我在此处介绍的不同ls = ['g', 'j', 'z', 'a', 'rr', 'ttt', 'b', 'c', 'd', 'f']
ls2 = ['1298a', 'eebbb', 'qcqcq321', 'mlkjmd', 'dùmlk', 'lof',
'erreee', 'bmw', 'ottt', 'jllll', 'lla' ]
def with_table():
table = {key: index for index, key in enumerate(ls)}
result = {}
for string in ls2:
for key in ls:
if key in string:
result[string] = table[key]
return result
def with_enumerate():
result = {}
for string in ls2:
for index, key in enumerate(ls):
if key in string:
result[string] = index
break
return result
def with_dict_comp():
return {string: index for string in ls2 for index, key in enumerate(ls) if key in string}
def with_itertools():
result = {}
for (index, key), string in itertools.product(enumerate(ls), ls2):
if key in string:
result[string] = index
return result
解决方案上积累的一些时间安排。感谢@ thierry-lathuille提供的测试数据:
%timeit with_table()
4.89 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit with_enumerate()
5.27 µs ± 66.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit with_dict_comp()
6.9 µs ± 83.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit with_itertools()
17.5 ns ± 0.193 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
enumerate
事实证明,为索引创建查找表比使用flex
动态计算它们要快一些。
答案 1 :(得分:0)
使用词典理解:
ls = ['apo','b','c','d25','egg','f','g']
ls2 = ['apoip21', 'oiujohuid25']
result = { string : index for index, i in enumerate(ls) for string in ls2 if i in string }
# {'apoip21': 0, 'oiujohuid25': 3}
答案 2 :(得分:0)
您的代码时间复杂度为O(n ^ 4),您可以使用dict将其设为O(n ^ 3)。
ls = ['a','b','c']
ls2 = ['1298a', 'eebbbd', 'qcqcq321']
word_dict=dict()
for i in range(len(ls)): #time complexity O(n)
word_dict[ls[i]]=i
for s in ls: #O(n)
for ss in ls2: #O(n)
if s in ss: #O(n)
print(s,ss,word_dict[s]) #O(1)
答案 3 :(得分:0)
因此,如果我将“更好”的请求解释为“以更紧凑的方式编写”,那么我建议列表中包含所有循环和条件:
>>> ls = ['apo','b','c','d25','egg','f','g']
>>> ls2 = ['apoip21', 'oiujohuid25']
>>> [ls.index(s) for s in ls for s2 in ls2 if s in s2]
[0, 3]
但如果“更好”被理解为“不那么复杂”,那么它并没有提高复杂性......
答案 4 :(得分:0)
您可以使用以下两项改进:
ls = ['a','b','c']
ls2 = ['1298a', 'eebbbd', 'qcqcq321']
# preprocess ls to avoid calling ls.index each time:
indices = {ss:index for index, ss in enumerate(ls)}
for s in ls2:
for ss in ls:
if ss in s:
print(ss, s, indices[ss])
# as s is guaranteed to include only one of the substrings,
# we don't have to test the other substrings once we found a match
break
# a 1298a 0
# b eebbbd 1
# c qcqcq321 2
一些时间:
找到匹配后突破循环总能提高速度。 由于创建索引的dict导致的开销使得它对于非常小的列表来说变得更慢,但是已经比列表更短的列表更快:
ls = ['g', 'j', 'z', 'a', 'rr', 'ttt', 'b', 'c', 'd', 'f']
ls2 = ['1298a', 'eebbb', 'qcqcq321', 'mlkjmd', 'dùmlk', 'lof', 'erreee', 'bmw', 'ottt', 'jllll', 'lla' ]
def original():
for s in ls:
for ss in ls2:
if s in ss:
out = (s,ss,ls.index(s))
def with_break():
for s in ls2:
for ss in ls:
if ss in s:
out = (ss, s, ls.index(ss))
# as s is guaranteed to include only one of the substrings,
# we don't have to test the other substrings once we found a match
break
def with_break_and_dict():
# preprocess ls to avoid calling ls.index each time:
indices = {ss:index for index, ss in enumerate(ls)}
for s in ls2:
for ss in ls:
if ss in s:
out = (ss, s, indices[ss])
# as s is guaranteed to include only one of the substrings,
# we don't have to test the other substrings once we found a match
break
计时结果:
%timeit original()
%timeit with_break()
%timeit with_break_and_dict()
# 100000 loops, best of 3: 12.8 µs per loop
# 100000 loops, best of 3: 9.5 µs per loop
# 100000 loops, best of 3: 8.49 µs per loop