我试图从字符串中检索日期。问题是这个日期的模式变化很大(字符串来自OCR读数)。这些是我需要识别的模式:
到目前为止,我所拥有的RegEx是一个轻微的改编(它现在允许空格而不仅仅是 - 或/将数字分开)来自stackoverflow answer:
match_date=re.search(r'(?:(?:31(\/|-|\.| )(?:0?[13578]|1[02]))\1|(?:(?:29|30)(\/|-|\.| )(?:0?[1,3-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)0?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.| )(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})',line)
有没有办法为这样一个"流体"建立一个正则表达式?日期结构?
答案 0 :(得分:2)
正则表达式:\b(?:\d{1,2}[- /]\s?){2}(?:\d{4}|\d{2})\b
或^(?:\d{1,2}[- /]\s?){2}(?:\d{4}|\d{2})$
答案 1 :(得分:1)
答案 2 :(得分:1)
I know regex is a better answer because with one line you can match all possibilities but I prefer convert to datetime
from datetime import datetime
string = "11- 11- 1111"
for fmt in ('%Y-%m-%d', '%d- %m- %Y', '%d %m %Y', '%d- %m- %y'):
try:
datetime_object = datetime.strptime(string, '%d- %m- %y')
...