我有以下字符串:
dateEntries = "04-20-2009; 04/20/09; 4/20/09; 4/3/09; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009; 20 Mar 2009; 20 March 2009; 2 Mar. 2009; 20 March, 2009; Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009; Feb 2009; Sep 2009; Oct 2010; 6/2008; 12/2009; 2009; 2010"
在这里,我想使用regex
提取所有提到的日期。作为尝试,我写了以下regex
:
import re
regEx = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)?(?:\d{2,4})'
re.findall(regEx, dateEntries)
我希望它能起作用,但是它只返回日期的子集。
A = ['Mar 20, 2009',
'March 20, 2009',
'Mar. 20, 2009',
'Mar 20 2009',
'20 Mar 2009',
'20 March 2009',
'2 Mar. 2009',
'20 March, 2009',
'Mar 20th, 2009',
'Mar 21st, 2009',
'Mar 22nd, 2009',
'Feb 2009',
'Sep 2009',
'Oct 2010']
我不明白为什么它不返回日期:
B=[04-20-2009; 04/20/09; 4/20/09; 4/3/09; 6/2008; 12/2009; 2009; 2010"]
我通过扩展regEx
来创建r'(?:\d{1,2}[-\s\/])?(?:\d{1,2}[-\/\s])?(?:\d{2,4})'
,它对集合B很好。但是regEx
无法产生A+B
任何人都可以帮助制作正则表达式以提取我的dateEntries
中提到的所有日期吗?
注意::我只想使用正则表达式来解决这个问题。
答案 0 :(得分:2)
在?
组之后,您只是缺少了一个(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
,以将其标记为不必要。另外,我在最后两组后面添加了+
,以确保正则表达式不会将“ 2009年3月20日”之类的日期拆分为两个不同的日期。
完整代码:
import re
regEx = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)?[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)+(?:\d{2,4})+'
dateEntries = "04-20-2009; 04/20/09; 4/20/09; 4/3/09; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009; 20 Mar 2009; 20 March 2009; 2 Mar. 2009; 20 March, 2009; Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009; Feb 2009; Sep 2009; Oct 2010; 6/2008; 12/2009; 2009; 2010"
result = re.findall(regEx, dateEntries)
print(result)
如果您的日期有前导空格,则结果也将有前导空格。如果您继续使用日期字符串,则可以将其删除,例如with the .strip()
method
答案 1 :(得分:0)
尝试正则表达式:
^(?:\d{1,2}(?:(?:-|/)|(?:th|st|nd|rd)?\s))?(?:(?:(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)(?:(?:-|/)|(?:,|\.)?\s)?)?(?:\d{1,2}(?:(?:-|/)|(?:th|st|nd|rd)?\s))?)(?:\d{2,4})$
答案 2 :(得分:0)
您的正则表达式模式是完全不可读的。.请使用简单的构建块来构建您的正则表达式模式。这将使代码更具可读性
import re
import calendar
full_months = [month for month in calendar.month_name if month]
short_months = [d[:3] for d in full_months]
months = '|'.join(short_months + full_months)
sep = r'[.,]?\s+' # seperator
day = r'\d+'
year = r'\d+'
day_or_year = r'\d+(?:\w+)?'
r = re.compile(rf'(?:{day}{sep})?(?:{months}){sep}{day_or_year}(?:{sep}{year})?')
r.findall(dateEntries)
# ['Mar 20, 2009', 'March 20, 2009', 'Mar. 20, 2009', 'Mar 20 2009', '20 Mar 2009', '20 March 2009', '2 Mar. 2009', '20 March, 2009', 'Mar 20th, 2009', 'Mar 21st, 2009', 'Mar 22nd, 2009', 'Feb 2009', 'Sep 2009', 'Oct 2010']
答案 3 :(得分:0)
您可以尝试以下正则表达式
(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)?[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)+(?:\d{2,4})+