在python中评估字符串中子串的存在的最佳性能方法?

时间:2014-04-06 04:49:58

标签: python string

我知道我能做到:

word = 'shrimp'
sentence = 'They eat shrimp in Mexico, and even more these days'
word in sentence

所以它会评估True

但如果我有:

words = ['a','la','los','shrimp']

如何评估句子是否包含任何words元素?

我关心效果,因为列表可能很大,我不想使用循环扩展代码行

4 个答案:

答案 0 :(得分:7)

可以读作英文。但可能不是最高效的。第一个基准,然后仅在需要时使用优化的解决方案。

any(word in sentence for word in words)
如果任何元素为True,则

any返回True,否则返回False。 (word in sentence for word in words)是一个可迭代的,它会产生每个单词是否在句子中。

答案 1 :(得分:3)

>>> words = ['a','la','los','shrimp']
>>> sentence = 'They eat shrimp in Mexico, and more these days'
>>> [word for word in words if word in sentence]
['a', 'shrimp']
>>> any(word in sentence for word in words)
True

words = ['a','la','los','shrimp']
sentence = 'They eat shrimp in Mexico, and more these days'
a = map(lambda word: word in words if word in sentence else None, words)
print(a.__next__())
True

有很多方法可以执行此操作。

由于您要求提供效果,以下是结果。

In [1]: words = ['a','la','los','shrimp']

In [2]: sentence = 'They eat shrimp in Mexico

In [3]: a = map(lambda word: word in words if word in sentence else None, words)

In [4]: a.__next__()
Out[4]: True

In [5]: %timeit a
10000000 loops, best of 3: 65.3 ns per loop

In [6]: b = any(word in sentence for word in words)

In [7]: %timeit b
10000000 loops, best of 3: 50.1 ns per loop

words = ['a','la','fresh','shrimp','ban']
sentence = 'They fresh bananas in Mexico, and more these days'
a = [word for word in words if word in sentence.split(' ')]
print(a) #['fresh']

b = any(word in sentence.split(' ') for word in words)
print(b) #True

答案 2 :(得分:1)

for word in words:
    if word in sentence:
        #do stuff

答案 3 :(得分:1)

这是我发现的。

测试是在100k字上完成的:

注意:如果你的列表很长,那么在函数内部生成集合很昂贵,但是如果你可以在调用函数之前预先处理set,那么set就会胜利。

段:

#!/usr/bin/python

import cProfile
from timeit import Timer
from faker import Faker


def func1(sentence, words):
    return any(word in sentence for word in words)

def func2(sentence, words):
    for word in words:
        if word in sentence:
            return True
    return False

def func3(sentence, words):
    # using set
    sets = set(sentence)
    return bool(set(words).intersection(sets))

def func4(sentence, words):
    # using set
    sets = set(sentence)
    for word in words:
        if word in sets:
            return True
    return False

def func5(sentence, words):
    # using set
    sentence, words = set(sentence), set(words)
    return not words.isdisjoint(sentence)

s = Faker()
sentence = s.sentence(nb_words=100000).split()
words = 'quidem necessitatibus minus id quos in neque omnis molestias'.split()

func = [ func1, func2, func3, func4, func5 ]

for fun in func:
    t = Timer(lambda: fun(sentence, words))
    print fun.__name__, cProfile.run('t.timeit(number=1000)')

输出:

结果:func2获胜

func1          5011 function calls in 0.032 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.032    0.032 <string>:1(<module>)
     1000    0.000    0.000    0.032    0.000 exists.py:42(<lambda>)
     1000    0.001    0.000    0.032    0.000 exists.py:8(func1)
     2000    0.030    0.000    0.030    0.000 exists.py:9(<genexpr>)
        1    0.000    0.000    0.000    0.000 timeit.py:143(setup)
        1    0.000    0.000    0.032    0.032 timeit.py:178(timeit)
        1    0.000    0.000    0.032    0.032 timeit.py:96(inner)
     1000    0.000    0.000    0.030    0.000 {any}
        1    0.000    0.000    0.000    0.000 {gc.disable}
        1    0.000    0.000    0.000    0.000 {gc.enable}
        1    0.000    0.000    0.000    0.000 {gc.isenabled}
        1    0.000    0.000    0.000    0.000 {globals}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {time.time}


None
func2          2011 function calls in 0.031 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.031    0.031 <string>:1(<module>)
     1000    0.031    0.000    0.031    0.000 exists.py:11(func2)
     1000    0.000    0.000    0.031    0.000 exists.py:42(<lambda>)
        1    0.000    0.000    0.000    0.000 timeit.py:143(setup)
        1    0.000    0.000    0.031    0.031 timeit.py:178(timeit)
        1    0.000    0.000    0.031    0.031 timeit.py:96(inner)
        1    0.000    0.000    0.000    0.000 {gc.disable}
        1    0.000    0.000    0.000    0.000 {gc.enable}
        1    0.000    0.000    0.000    0.000 {gc.isenabled}
        1    0.000    0.000    0.000    0.000 {globals}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {time.time}


None
func3          3011 function calls in 7.079 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    7.079    7.079 <string>:1(<module>)
     1000    7.069    0.007    7.073    0.007 exists.py:17(func3)
     1000    0.004    0.000    7.077    0.007 exists.py:42(<lambda>)
        1    0.000    0.000    0.000    0.000 timeit.py:143(setup)
        1    0.000    0.000    7.079    7.079 timeit.py:178(timeit)
        1    0.002    0.002    7.079    7.079 timeit.py:96(inner)
        1    0.000    0.000    0.000    0.000 {gc.disable}
        1    0.000    0.000    0.000    0.000 {gc.enable}
        1    0.000    0.000    0.000    0.000 {gc.isenabled}
        1    0.000    0.000    0.000    0.000 {globals}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1000    0.004    0.000    0.004    0.000 {method 'intersection' of 'set' objects}
        2    0.000    0.000    0.000    0.000 {time.time}


None
func4          2011 function calls in 7.022 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    7.022    7.022 <string>:1(<module>)
     1000    7.014    0.007    7.014    0.007 exists.py:22(func4)
     1000    0.006    0.000    7.020    0.007 exists.py:42(<lambda>)
        1    0.000    0.000    0.000    0.000 timeit.py:143(setup)
        1    0.000    0.000    7.022    7.022 timeit.py:178(timeit)
        1    0.002    0.002    7.022    7.022 timeit.py:96(inner)
        1    0.000    0.000    0.000    0.000 {gc.disable}
        1    0.000    0.000    0.000    0.000 {gc.enable}
        1    0.000    0.000    0.000    0.000 {gc.isenabled}
        1    0.000    0.000    0.000    0.000 {globals}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {time.time}


None
func5          3011 function calls in 7.142 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    7.142    7.142 <string>:1(<module>)
     1000    7.133    0.007    7.134    0.007 exists.py:30(func5)
     1000    0.006    0.000    7.140    0.007 exists.py:42(<lambda>)
        1    0.000    0.000    0.000    0.000 timeit.py:143(setup)
        1    0.000    0.000    7.142    7.142 timeit.py:178(timeit)
        1    0.002    0.002    7.142    7.142 timeit.py:96(inner)
        1    0.000    0.000    0.000    0.000 {gc.disable}
        1    0.000    0.000    0.000    0.000 {gc.enable}
        1    0.000    0.000    0.000    0.000 {gc.isenabled}
        1    0.000    0.000    0.000    0.000 {globals}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1000    0.002    0.000    0.002    0.000 {method 'isdisjoint' of 'set' objects}
        2    0.000    0.000    0.000    0.000 {time.time}


None