用于查找短语的Python正则表达式包含确切的单词

时间:2017-03-29 08:28:07

标签: python regex

我有一个字符串列表,希望找到确切的阶段。

到目前为止,我的代码仅查找了月份和年份,但整个阶段包括“ - 已录制”,例如“2016年3月 - 已录制”。

如何将“ - Recorded”添加到正则表达式?

import re


texts = [

"Shawn Dookhit took annual leave in March 2016 - Recorded The report",
"Soondren Armon took medical leave in February 2017 - Recorded It was in",
"David Padachi took annual leave in May 2016 - Recorded It says",
"Jack Jagoo",
"Devendradutt Ramgolam took medical leave in August 2016 - Recorded Day back",
"Kate Dudhee",
"Vinaye Ramjuttun took annual leave in  - Recorded Answering"

]

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s')     

for t in texts:
    try:
        m = regex.search(t)
        print m.group()
    except:
        print "keyword's not found"

2 个答案:

答案 0 :(得分:2)

您在这里有2个命名组:monthyear,它们需要您的字符串中的月份和年份。要将- Recorded转换为recorded命名组,您可以执行以下操作:

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<recorded>- Recorded)')

或者,如果您只是在没有命名组的正则表达式中添加- Recorded

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s- Recorded')

或者您可以使用连字符和一个大写单词添加命名组other

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<other>- [A-Z][a-z]+)') 

我认为第一个或第三个选项更可取,因为您已经有了命名组。另外我建议你使用这个网站http://pythex.org/,它真的有助于构建正则表达式:)。

答案 1 :(得分:1)

将列表理解与更正的正则表达式一起使用:

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s* - Recorded')

matches = [match.groups() for text in texts for match in [regex.search(text)] if match]
print(matches)
# [('March', '2016'), ('February', '2017'), ('May', '2016'), ('August', '2016')]