我必须使用正则表达式从字符串中识别出不同的日期格式。
date can contain 21/12/2018
or 12/21/2018
or 2018/12/21
or 12/2018
or 21-12-2018
or 12-21-2018
or 2018-12-21
or 21-Jan-2018
or Jan 21,2018
or 21st Jan 2018
or 21-Jan-2018
or Jan 21,2018
or 21st Jan 2018
or Jan 21, 2018
or Jan 21, 2018
or 2018 Dec. 21
or 2018 Dec 21
or 21st of Jan 2018
or 21st of Jan 2018
or Jan 2018
or Jan 2018
or Jan. 2018
or Jan, 2018
or 2018
[should recognize (year only), (year and month), (year, month and day), year is mandatory in every date format to be recognized]
[months are abbreviated to three letters, first letter capital]
我的正则表达式如下
\b(((((0?[1-9]|[12][0-9]|3[01])(\s*(st|nd|rd|th)?\s*(of)?\s*)?)|(20[012]\d)|(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))[\/\-\.\,\s]*){1,3})\b
它没有按预期运行,并且也获得了其他模式。我必须识别三个模式(year only)
,(year and month)
,(year, month and day)
,在要识别的每个日期模式中,年份是必填项。
要使其正常运行需要进行哪些更正?请提供帮助。
答案 0 :(得分:5)
IIUC,File Metadata比import dateutil.parser as dparser
l = ["21/12/2018","12/21/2018","2018/12/21","12/2018",
"21-12-2018","12-21-2018","2018-12-21","21-Jan-2018",
"Jan 21,2018","21st Jan 2018","21-Jan-2018","Jan 21,2018",
"21st Jan 2018","Jan 21, 2018","Jan 21, 2018","2018 Dec. 21",
"2018 Dec 21","21st of Jan 2018","21st of Jan 2018","Jan 2018",
"Jan 2018","Jan. 2018","Jan, 2018","2018"]
[str(dparser.parse(i, fuzzy=True)) for i in l]
更好:
['2018-12-21 00:00:00',
'2018-12-21 00:00:00',
'2018-12-21 00:00:00',
'2018-12-07 00:00:00',
'2018-12-21 00:00:00',
'2018-12-21 00:00:00',
'2018-12-21 00:00:00',
'2018-01-21 00:00:00',
'2019-01-21 00:00:00',
'2018-01-21 00:00:00',
'2018-01-21 00:00:00',
'2019-01-21 00:00:00',
'2018-01-21 00:00:00',
'2018-01-21 00:00:00',
'2018-01-21 00:00:00',
'2018-12-21 00:00:00',
'2018-12-21 00:00:00',
'2018-01-21 00:00:00',
'2018-01-21 00:00:00',
'2018-01-07 00:00:00',
'2018-01-07 00:00:00',
'2018-01-07 00:00:00',
'2018-01-07 00:00:00',
'2018-08-07 00:00:00']
输出:
dateutil.parser
s = 'The new millennium has finally come and it is now 1st of Jan 2000.'
str(dparser.parse(s, fuzzy=True))
# '2000-01-01 00:00:00'
还可以处理句子中是否包含类似日期的内容(尽管并非总是如此):
{{1}}