我有一个字符串列表,我想在其中筛选包含关键字的字符串。
我想做类似的事情:
fruit = re.compile('apple', 'banana', 'peach', 'plum', 'pinepple', 'kiwi']
所以我可以使用re.search(fruit,list_of_strings)来只获取包含水果的字符串,但我不确定如何使用re.compile列表。有什么建议? (我没有开始使用re.compile,但我认为正则表达式是一种很好的方法。)
答案 0 :(得分:41)
您需要将水果列表转换为字符串apple|banana|peach|plum|pineapple|kiwi
,以便它是有效的正则表达式,以下内容应该为您执行此操作:
fruit_list = ['apple', 'banana', 'peach', 'plum', 'pineapple', 'kiwi']
fruit = re.compile('|'.join(fruit_list))
编辑:正如评论中指出的那样,你可能希望在正则表达式中添加单词边界,否则正则表达式将匹配像plump
这样的单词,因为它们有一个水果作为一个子串。
fruit = re.compile(r'\b(?:%s)\b' % '|'.join(fruit_list))
答案 1 :(得分:6)
如您所希望完全匹配,真正需要正则表达式...
fruits = ['apple', 'cherry']
sentences = ['green apple', 'yellow car', 'red cherry']
for s in sentences:
if any(f in s for f in fruits):
print s, 'contains a fruit!'
# green apple contains a fruit!
# red cherry contains a fruit!
编辑:如果您需要访问匹配的字符串:
from itertools import compress
fruits = ['apple', 'banana', 'cherry']
s = 'green apple and red cherry'
list(compress(fruits, (f in s for f in fruits)))
# ['apple', 'cherry']
答案 2 :(得分:2)
当找到任何条款时,您可以创建一个匹配的正则表达式:
>>> s, t = "A kiwi, please.", "Strawberry anyone?"
>>> import re
>>> pattern = re.compile('apple|banana|peach|plum|pineapple|kiwi', re.IGNORECASE)
>>> pattern.search(s)
<_sre.SRE_Match object at 0x10046d4a8>
>>> pattern.search(t) # won't find anything
答案 3 :(得分:2)
代码:
fruits = ['apple', 'banana', 'peach', 'plum', 'pinepple', 'kiwi']
fruit_re = [re.compile(fruit) for fruit in fruits]
fruit_test = lambda x: any([pattern.search(x) for pattern in fruit_re])
使用示例:
fruits_veggies = ['this is an apple', 'this is a tomato']
return [fruit_test(str) for str in fruits_veggies]
编辑:我意识到安德鲁的解决方案更好。你可以用Andrew的正则表达式改进fruit_test
fruit_test = lambda x: andrew_re.search(x) is None
答案 4 :(得分:0)
Pyhton 3.x更新:
fruit_list = ['apple', 'banana', 'peach', 'plum', 'pineapple', 'kiwi']
fruit = re.compile(r'\b(?:{0})\b'.format('|'.join(fruit_list))