Question

我正在尝试使用python中的Regex提取位置。现在我这样做：

def get_location(s):
    s = s.strip(STRIP_CHARS)
    keywords = "at|outside|near"
    location_pattern = "(?P<location>((?P<place>{keywords}\s[A-Za-z]+)))".format(keywords = keywords)
    location_regex = re.compile(location_pattern, re.IGNORECASE | re.MULTILINE | re.UNICODE | re.DOTALL | re.VERBOSE)

    for match in location_regex.finditer(s):
        match_str = match.group(0)
        indices = match.span(0)
        print ("Match", match)
        match_str = match.group(0)
        indices = match.span(0)
        print (match_str)

get_location("Im at building 3")

我有三个问题：

它只是在＆＃34; at＆＃34;作为输出但它也应该给予建设。
captures = match.capturesdict()我无法用它来提取其他示例中的捕获。
当我这样做时location_pattern = 'at|outside\s\w+。它似乎工作。有人能解释我做错了吗？

Answer 1

此处的主要问题是您需要将{keywords}放入非捕获组：(?:{keywords})。以下是一个示意图示例：a|b|c\s+\w+匹配a或b或c + <whitespace(s)> + . When you put the alternation list into a group,（a | b | c）\ s + \ w + , it matches either一个, or b or c`然后它才会尝试匹配空格，然后匹配单词字符。

查看更新的代码（demo online）：

import regex as re
def get_location(s):
    STRIP_CHARS = '*'
    s = s.strip(STRIP_CHARS)
    keywords = "at|outside|near"
    location_pattern = "(?P<location>((?P<place>(?:{keywords})\s+[A-Za-z]+)))".format(keywords = keywords)
    location_regex = re.compile(location_pattern, re.IGNORECASE | re.UNICODE)

    for match in location_regex.finditer(s):
        match_str = match.group(0)
        indices = match.span(0)
        print ("Match", match)
        match_str = match.group(0)
        indices = match.span(0)
        print (match_str)
        captures = match.capturesdict()
        print(captures)

get_location("Im at building 3")

输出：

('Match', <regex.Match object; span=(3, 14), match='at building'>)
at building
{'place': ['at building'], 'location': ['at building']}

请注意location_pattern = 'at|outside\s\w+无法正常工作，因为at在任何地方都匹配，outside必须跟随空格和字词。您可以采用相同的方式修复它：(at|outside)\s\w+。

如果您将关键字放入一个组中，the captures = match.capturesdict()将会正常运行（请参阅上面的输出）。

如何使用正则表达式在python中提取关键字列表后的单词？

1 个答案: