Question

在字符串中搜索匹配单词的最佳方法是什么？

现在我做的事情如下：

if re.search('([h][e][l][l][o])',file_name_tmp, re.IGNORECASE):

哪个有效，但它很慢，因为我可能有大约100个不同的正则表达式语句搜索完整的单词所以我想结合几个使用|分隔符等等。

Answer 1

你可以尝试：

if 'hello' in longtext:

或

if 'HELLO' in longtext.upper():

匹配hello / Hello / HELLO。

Answer 2

>>> words = ('hello', 'good\-bye', 'red', 'blue')
>>> pattern = re.compile('(' + '|'.join(words) + ')', re.IGNORECASE)
>>> sentence = 'SAY HeLLo TO reD, good-bye to Blue.'
>>> print pattern.findall(sentence)
['HeLLo', 'reD', 'good-bye', 'Blue']

Answer 3

如果您要检查字符串中的'hello'或完整单词，您也可以

if 'hello' in stringToMatch:
    ... # Match found , do something

要查找各种字符串，您还可以使用find all

>>>toMatch = 'e3e3e3eeehellloqweweemeeeeefe'
>>>regex = re.compile("hello|me",re.IGNORECASE)
>>>print regex.findall(toMatch)
>>>[u'me']
>>>toMatch = 'e3e3e3eeehelloqweweemeeeeefe'
>>>print regex.findall(toMatch)
>>>[u'hello', u'me']
>>>toMtach = 'e3e3e3eeeHelLoqweweemeeeeefe'
>>>print regex.findall(toMatch)
>>>[u'HelLo', u'me']

Answer 4

你说你想搜索WORDS。你对“单词”的定义是什么？如果您正在寻找“见面”，您真的想要在“会议”中匹配“见面”吗？如果没有，你可能想尝试这样的事情：

>>> import re
>>> query = ("meet", "lot")
>>> text = "I'll meet a lot of friends including Charlotte at the town meeting"
>>> regex = r"\b(" + "|".join(query) + r")\b"
>>> re.findall(regex, text, re.IGNORECASE)
['meet', 'lot']
>>>

每一端的\b强制它只匹配字边界，使用re的“字”定义 - “不是”不是一个字，它是两个字由撇号分开。如果您不喜欢，请查看nltk包。

python正则表达式问题

4 个答案: