正则表达式正在发挥作用。但是真的不知道,某些部分有什么问题

时间:2015-02-12 14:30:21

标签: python regex

您好我在以下链接中遇到了正则表达式的问题。

https://regex101.com/r/wU4xK1/1

它几乎匹配所有模式。但当它遇到一些角色或换行时,我正在努力。

我的正则表达式是:

 (\b(?:(jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|set|sep|september|oct|october|nov|november|dec|december)[/\.\s',’-]{0,4}\d{2,4}|(jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|set|sep|september|oct|october|nov|november|dec|december))[/\r-–––,]{0,4}[a-zA-Z]{3,8}[/\.\s',’-]{0,2}[\s]{0,4}\d{2,4})

我的文字是:

July 2005 – December - 2006 

(Nov '12 - Feb 12)

(Nov 12 - Feb 12       )

july 2005 – Dec 2012 ## Note here. If i press enter after Dec 2012 I will    get a match. Dont know why ?

3 个答案:

答案 0 :(得分:2)

只需将所有捕获组转为非捕获组,然后将整个模式包含在单个捕获组中。

((?:\b(?:(?:jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|set|sep|september|oct|october|nov|november|dec|december)[/\.\s',’-]{0,4}\d{2,4}|(jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|set|sep|september|oct|october|nov|november|dec|december))[/\r-–––,]{0,4}[a-zA-Z]{3,8}[/\.\s',’-]{0,2}[\s]{0,4}\d{2,4}))

DEMO

>>> s = '''July 2005 – December - 2006 

(Nov '12 - Feb 12)

(Nov 12 - Feb 12       )

july 2005 – Dec 2012 ## Note here. If i press enter after Dec 2012 I will    get a match. Dont know why ?'''
>>> re.findall(r"(?mi)((?:\b(?:(?:jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|set|sep|september|oct|october|nov|november|dec|december)[/\.\s',’-]{0,4}\d{2,4}|(jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|set|sep|september|oct|october|nov|november|dec|december))[/\r-–––,]{0,4}[a-zA-Z]{3,8}[/\.\s',’-]{0,2}[\s]{0,4}\d{2,4}))", s)
[('July 2005 – December - 2006', ''), ("Nov '12 - Feb 12", ''), ('Nov 12 - Feb 12', ''), ('july 2005 – Dec 2012', '')]
>>> m = re.findall(r"(?mi)((?:\b(?:(?:jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|set|sep|september|oct|october|nov|november|dec|december)[/\.\s',’-]{0,4}\d{2,4}|(jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|set|sep|september|oct|october|nov|november|dec|december))[/\r-–––,]{0,4}[a-zA-Z]{3,8}[/\.\s',’-]{0,2}[\s]{0,4}\d{2,4}))", s)
>>> [(x) for x,y in m]
['July 2005 – December - 2006', "Nov '12 - Feb 12", 'Nov 12 - Feb 12', 'july 2005 – Dec 2012']

(?mi)这里我们结合了多行和不区分大小写的修饰符。

答案 1 :(得分:1)

您的正则表达式确实有效,但您必须删除正则表达式末尾的空行。 看到 https://regex101.com/r/wU4xK1/3

答案 2 :(得分:1)

除了正确的Aaron的评论(删除该换行符后,会显示比赛),我还想提及\ s匹配 [\ r \ n]中的任何空格字符n \ t \ f] 类,因此您可以通过将组限制为 [\ t \ t \ f] 来避免捕获换行符。