我知道我能做到:
word = 'shrimp'
sentence = 'They eat shrimp in Mexico, and even more these days'
word in sentence
所以它会评估True
。
但如果我有:
words = ['a','la','los','shrimp']
如何评估句子是否包含任何words
元素?
我关心效果,因为列表可能很大,我不想使用循环扩展代码行
答案 0 :(得分:7)
可以读作英文。但可能不是最高效的。第一个基准,然后仅在需要时使用优化的解决方案。
any(word in sentence for word in words)
如果任何元素为True,则 any返回True,否则返回False。 (word in sentence for word in words)
是一个可迭代的,它会产生每个单词是否在句子中。
答案 1 :(得分:3)
>>> words = ['a','la','los','shrimp']
>>> sentence = 'They eat shrimp in Mexico, and more these days'
>>> [word for word in words if word in sentence]
['a', 'shrimp']
>>> any(word in sentence for word in words)
True
words = ['a','la','los','shrimp']
sentence = 'They eat shrimp in Mexico, and more these days'
a = map(lambda word: word in words if word in sentence else None, words)
print(a.__next__())
True
有很多方法可以执行此操作。
由于您要求提供效果,以下是结果。
In [1]: words = ['a','la','los','shrimp']
In [2]: sentence = 'They eat shrimp in Mexico
In [3]: a = map(lambda word: word in words if word in sentence else None, words)
In [4]: a.__next__()
Out[4]: True
In [5]: %timeit a
10000000 loops, best of 3: 65.3 ns per loop
In [6]: b = any(word in sentence for word in words)
In [7]: %timeit b
10000000 loops, best of 3: 50.1 ns per loop
words = ['a','la','fresh','shrimp','ban']
sentence = 'They fresh bananas in Mexico, and more these days'
a = [word for word in words if word in sentence.split(' ')]
print(a) #['fresh']
b = any(word in sentence.split(' ') for word in words)
print(b) #True
答案 2 :(得分:1)
for word in words:
if word in sentence:
#do stuff
答案 3 :(得分:1)
这是我发现的。
测试是在100k字上完成的:
注意:如果你的列表很长,那么在函数内部生成集合很昂贵,但是如果你可以在调用函数之前预先处理set,那么set就会胜利。
段:
#!/usr/bin/python
import cProfile
from timeit import Timer
from faker import Faker
def func1(sentence, words):
return any(word in sentence for word in words)
def func2(sentence, words):
for word in words:
if word in sentence:
return True
return False
def func3(sentence, words):
# using set
sets = set(sentence)
return bool(set(words).intersection(sets))
def func4(sentence, words):
# using set
sets = set(sentence)
for word in words:
if word in sets:
return True
return False
def func5(sentence, words):
# using set
sentence, words = set(sentence), set(words)
return not words.isdisjoint(sentence)
s = Faker()
sentence = s.sentence(nb_words=100000).split()
words = 'quidem necessitatibus minus id quos in neque omnis molestias'.split()
func = [ func1, func2, func3, func4, func5 ]
for fun in func:
t = Timer(lambda: fun(sentence, words))
print fun.__name__, cProfile.run('t.timeit(number=1000)')
输出:
结果:func2获胜
func1 5011 function calls in 0.032 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.032 0.032 <string>:1(<module>)
1000 0.000 0.000 0.032 0.000 exists.py:42(<lambda>)
1000 0.001 0.000 0.032 0.000 exists.py:8(func1)
2000 0.030 0.000 0.030 0.000 exists.py:9(<genexpr>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 0.032 0.032 timeit.py:178(timeit)
1 0.000 0.000 0.032 0.032 timeit.py:96(inner)
1000 0.000 0.000 0.030 0.000 {any}
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
func2 2011 function calls in 0.031 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.031 0.031 <string>:1(<module>)
1000 0.031 0.000 0.031 0.000 exists.py:11(func2)
1000 0.000 0.000 0.031 0.000 exists.py:42(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 0.031 0.031 timeit.py:178(timeit)
1 0.000 0.000 0.031 0.031 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
func3 3011 function calls in 7.079 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 7.079 7.079 <string>:1(<module>)
1000 7.069 0.007 7.073 0.007 exists.py:17(func3)
1000 0.004 0.000 7.077 0.007 exists.py:42(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 7.079 7.079 timeit.py:178(timeit)
1 0.002 0.002 7.079 7.079 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000 0.004 0.000 0.004 0.000 {method 'intersection' of 'set' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
func4 2011 function calls in 7.022 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 7.022 7.022 <string>:1(<module>)
1000 7.014 0.007 7.014 0.007 exists.py:22(func4)
1000 0.006 0.000 7.020 0.007 exists.py:42(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 7.022 7.022 timeit.py:178(timeit)
1 0.002 0.002 7.022 7.022 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
func5 3011 function calls in 7.142 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 7.142 7.142 <string>:1(<module>)
1000 7.133 0.007 7.134 0.007 exists.py:30(func5)
1000 0.006 0.000 7.140 0.007 exists.py:42(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 7.142 7.142 timeit.py:178(timeit)
1 0.002 0.002 7.142 7.142 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000 0.002 0.000 0.002 0.000 {method 'isdisjoint' of 'set' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None