Question

我有一个字符串列表，希望找到确切的阶段。

到目前为止，我的代码仅查找了月份和年份，但整个阶段包括“ - 已录制”，例如“2016年3月 - 已录制”。

如何将“ - Recorded”添加到正则表达式？

import re


texts = [

"Shawn Dookhit took annual leave in March 2016 - Recorded The report",
"Soondren Armon took medical leave in February 2017 - Recorded It was in",
"David Padachi took annual leave in May 2016 - Recorded It says",
"Jack Jagoo",
"Devendradutt Ramgolam took medical leave in August 2016 - Recorded Day back",
"Kate Dudhee",
"Vinaye Ramjuttun took annual leave in  - Recorded Answering"

]

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s')     

for t in texts:
    try:
        m = regex.search(t)
        print m.group()
    except:
        print "keyword's not found"

Answer 1

您在这里有2个命名组：month和year，它们需要您的字符串中的月份和年份。要将- Recorded转换为recorded命名组，您可以执行以下操作：

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<recorded>- Recorded)')

或者，如果您只是在没有命名组的正则表达式中添加- Recorded：

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s- Recorded')

或者您可以使用连字符和一个大写单词添加命名组other：

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<other>- [A-Z][a-z]+)')

我认为第一个或第三个选项更可取，因为您已经有了命名组。另外我建议你使用这个网站http://pythex.org/，它真的有助于构建正则表达式:)。

Answer 2

将列表理解与更正的正则表达式一起使用：

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s* - Recorded')

matches = [match.groups() for text in texts for match in [regex.search(text)] if match]
print(matches)
# [('March', '2016'), ('February', '2017'), ('May', '2016'), ('August', '2016')]

用于查找短语的Python正则表达式包含确切的单词

2 个答案: