Question

我试图在句子中知道字符串（单词）的位置。我正在使用下面的功能。此函数适用于大多数单词，但对于句子GLC-SX-MM=中的此字符串I have a lot of GLC-SX-MM= in my inventory list，无法获得匹配。我试着scaping - 而且=但不行。任何的想法？我无法使用空格分割句子，因为有时我会用空格分隔单词。

import re 

def get_start_end(self, sentence, key):
        r = re.compile(r'\b(%s)\b' % key, re.I)
        m = r.search(question)
        start = m.start()
        end = m.end()
        return start, end

Answer 1

在查找文字字符串时，您需要转义密钥，并确保使用明确的(?<!\w)和(?!\w)边界：

import re 

def get_start_end(self, sentence, key):
    r = re.compile(r'(?<!\w){}(?!\w)'.format(re.escape(key)), re.I)
    m = r.search(question)
    start = m.start()
    end = m.end()
    return start, end

r'(?<!\w){}(?!\w)'.format(re.escape(key))会在(?<!\w)abc\.def\=(?!\w)个关键字中构建abc.def=这样的正则表达式，如果在左侧有一个字词char，则(?<!\w)会失败。如果关键字右侧有一个单词char，则关键字和(?!\w)将失败任何匹配。

Answer 2

这不是实际的答案，但有助于解决问题。

您可以动态获取模式以进行调试。

import re 

def get_start_end(sentence, key):
        r = re.compile(r'\b(%s)\b' % key, re.I)
        print(r.pattern)

sentence = "foo-bar is not foo=bar"

get_start_end(sentence, 'o-')
get_start_end(sentence, 'o=')

\b(o-)\b
\b(o=)\b

然后，您可以尝试手动匹配模式，如果匹配则使用https://regex101.com/。

搜索包含前导或尾随特殊字符的整个单词，例如 - 和=在python中使用正则表达式

2 个答案: