我有一个字符串列表,希望找到确切的阶段。
到目前为止,我的代码仅查找了月份和年份,但整个阶段包括“ - 已录制”,例如“2016年3月 - 已录制”。
如何将“ - Recorded”添加到正则表达式?
import re
texts = [
"Shawn Dookhit took annual leave in March 2016 - Recorded The report",
"Soondren Armon took medical leave in February 2017 - Recorded It was in",
"David Padachi took annual leave in May 2016 - Recorded It says",
"Jack Jagoo",
"Devendradutt Ramgolam took medical leave in August 2016 - Recorded Day back",
"Kate Dudhee",
"Vinaye Ramjuttun took annual leave in - Recorded Answering"
]
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s')
for t in texts:
try:
m = regex.search(t)
print m.group()
except:
print "keyword's not found"
答案 0 :(得分:2)
您在这里有2个命名组:month
和year
,它们需要您的字符串中的月份和年份。要将- Recorded
转换为recorded
命名组,您可以执行以下操作:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<recorded>- Recorded)')
或者,如果您只是在没有命名组的正则表达式中添加- Recorded
:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s- Recorded')
或者您可以使用连字符和一个大写单词添加命名组other
:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<other>- [A-Z][a-z]+)')
我认为第一个或第三个选项更可取,因为您已经有了命名组。另外我建议你使用这个网站http://pythex.org/,它真的有助于构建正则表达式:)。
答案 1 :(得分:1)
将列表理解与更正的正则表达式一起使用:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s* - Recorded')
matches = [match.groups() for text in texts for match in [regex.search(text)] if match]
print(matches)
# [('March', '2016'), ('February', '2017'), ('May', '2016'), ('August', '2016')]