我正在尝试使用python中的Regex提取位置。 现在我这样做:
def get_location(s):
s = s.strip(STRIP_CHARS)
keywords = "at|outside|near"
location_pattern = "(?P<location>((?P<place>{keywords}\s[A-Za-z]+)))".format(keywords = keywords)
location_regex = re.compile(location_pattern, re.IGNORECASE | re.MULTILINE | re.UNICODE | re.DOTALL | re.VERBOSE)
for match in location_regex.finditer(s):
match_str = match.group(0)
indices = match.span(0)
print ("Match", match)
match_str = match.group(0)
indices = match.span(0)
print (match_str)
get_location("Im at building 3")
我有三个问题:
captures = match.capturesdict()
我无法用它来提取其他示例中的捕获。location_pattern = 'at|outside\s\w+
。它似乎工作。有人能解释我做错了吗?答案 0 :(得分:1)
此处的主要问题是您需要将{keywords}
放入非捕获组:(?:{keywords})
。以下是一个示意图示例:a|b|c\s+\w+
匹配a
或b
或c
+ <whitespace(s)>
+ . When you put the alternation list into a group,
(a | b | c)\ s + \ w + , it matches either
一个, or
b or
c`然后它才会尝试匹配空格,然后匹配单词字符。
查看更新的代码(demo online):
import regex as re
def get_location(s):
STRIP_CHARS = '*'
s = s.strip(STRIP_CHARS)
keywords = "at|outside|near"
location_pattern = "(?P<location>((?P<place>(?:{keywords})\s+[A-Za-z]+)))".format(keywords = keywords)
location_regex = re.compile(location_pattern, re.IGNORECASE | re.UNICODE)
for match in location_regex.finditer(s):
match_str = match.group(0)
indices = match.span(0)
print ("Match", match)
match_str = match.group(0)
indices = match.span(0)
print (match_str)
captures = match.capturesdict()
print(captures)
get_location("Im at building 3")
输出:
('Match', <regex.Match object; span=(3, 14), match='at building'>)
at building
{'place': ['at building'], 'location': ['at building']}
请注意location_pattern = 'at|outside\s\w+
无法正常工作,因为at
在任何地方都匹配,outside
必须跟随空格和字词。您可以采用相同的方式修复它:(at|outside)\s\w+
。
如果您将关键字放入一个组中,the captures = match.capturesdict()
将会正常运行(请参阅上面的输出)。