用于提取用户定义日期格式的正则表达式

时间:2018-01-30 07:28:14

标签: regex date

我有一个充满字符串的数据集,我想分开包含Dates的字符串 我编写了以下正则表达式来提取它们:

print (re.findall(r'[Jan(uary)?|Feb(ruary)?|Mar(ch)?||April|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?]+\s\d+', x))

其中x表示正在处理的字符串。 我想获得以下格式: 例如:

December 2018
Feb 11-12
Feb 12-Mar 21
3rd Jan
February 12

然而,还提取了一些额外的字符串。像:

"Of 2017" from the string "BEST OF 2017"

"Line 1" from the string "Line 1"

"'addington 2" & "Paddington 2" from string "Paddington 2"

'hopping 3', 'as 20'

如何修复这些错误?

2 个答案:

答案 0 :(得分:1)

你正在寻找的正则表达式有点复杂:

^(\d{1,2}\w{2} )?((Jan(uary)?|Feb(ruary)?|Mar(ch)?|April|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)[- \d]*)+$

Here's a full test

答案 1 :(得分:0)

https://regex101.com/进行了测试,按预期工作

/Jan(uary)?|Feb(ruary)?|Mar(ch)?|April|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?]+\s\d+/