我需要在Python中实现一个函数,该函数能够从输入字符串中检索多种日期格式,将它们更改为一种特定格式并仅返回日期:
Format Example Input String
MMDDYYYY foo.bar.02242015.txt
MMDDYY foo.bar.022415.txt
MONCCYY foo.bar.FEB2015.txt
YYYY-MM-DD foo_bar_2015-02-01_2015-02-28.txt
YYYYMMDD foo_bar_20150224.txt
MM_YY foo_bar_02_15.txt
YYYYMMDD foo_bar_20150224.txt
输出:只是一个固定的8位数日期格式(没有foo,bar或txt):
YYYYMMDD (e.g. 20120524)
示例:
Input Output
foo.bar.02242015.txt -> 20150224
一些要求:
foo_02_15.txt -> 20150228
foo_02_24_16.txt -> 20160224
foo.FEB2015.txt -> 20150228
foo_2015-02-01_2015-02-28.txt -> 20150228
任何人都知道如何在Python中使用Regex?或者最佳做法是什么?
答案 0 :(得分:0)
UPDATE2 请尝试以下方法(python 2.7):
import re
import calendar
INPUT = ['foo.bar.02242015.txt',
'foo.bar.022415.txt',
'foo.bar.FEB2015.txt',
'foo_bar_2015-02-01_2015-02-28.txt',
'foo_bar_20150224.txt',
'foo_bar_02_15.txt',
'foo_bar_20150224.txt' ]
P1 = r'(0[1-9]|1[0-2])(0[1-9]|[12][0-9]|3[01])((?:19|20)?\d{2})'
P2 = r'[A-Z]{3}[12]\d{3}|[12]\d{3}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])_?|(?:0[1-9]|1[0-2])_[12]\d'
MONTHS = ['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC']
def StrFormat(date_string):
m2 = re.findall(P2, date_string)
if m2:
for m in m2:
if len(m) == 5:
month, year = m.split('_')[0], '20' + m.split('_')[1]
last_day = calendar.monthrange(int(year), int(month))[1]
date_string = re.sub(P2, year+month+ str(last_day), date_string, 1)
elif len(m) == 7:
month, year = str(MONTHS.index(m[0:3]) + 1).zfill(2), m[3:]
last_day = calendar.monthrange(int(year), int(month))[1]
date_string = re.sub(P2, year+month+ str(last_day), date_string, 1)
elif len(m) == 10:
date_string = re.sub(P2, m.replace('-', ''), date_string, 1)
elif len(m) > 5:
date_string = re.sub(P2, '', date_string, 1)
m1 = re.findall(P1, date_string)
if m1:
for m in m1:
if len(m[2]) == 2:
date_string = re.sub(P1, r'20\3\1\2', date_string, 1)
elif len(m[2]) == 4:
date_string = re.sub(P1, r'\3\1\2', date_string, 1)
elif len(m) > 2:
date_string = re.sub(P1, '', date_string, 1)
return date_string
for i in INPUT:
print i.ljust(35), '->', StrFormat(i).rjust(20)
输出:
foo.bar.02242015.txt -> foo.bar.20150224.txt
foo.bar.022415.txt -> foo.bar.20150224.txt
foo.bar.FEB2015.txt -> foo.bar.20150228.txt
foo_bar_2015-02-01_2015-02-28.txt -> foo_bar_20150228.txt
foo_bar_20150224.txt -> foo_bar_20150224.txt
foo_bar_02_15.txt -> foo_bar_20150228.txt
foo_bar_20150224.txt -> foo_bar_20150224.txt
顺便说一下:正如noob所建议的那样10% Regex + 90% programming
: - )
答案 1 :(得分:0)
试试这个:
import re
import time
import datetime
import calendar
p = re.compile(ur'(?<=\.|_)([A-Z\d+_-]*?([A-Z\d+_-]{0,10}))(?=\.)')
test_str = u"Format Example Input String \n\nMMDDYYYY foo.bar.02242015.txt\nMMDDYY foo.bar.022415.txt\nMONCCYY foo.bar.FEB2015.txt\nYYYY-MM-DD foo_bar_2015-02-01_2015-02-28.txt\nYYYYMMDD foo_bar_20150224.txt\nMM_YY foo_bar_02_15.txt\nYYYYMMDD foo_bar_20150224.txt"
def changedate(date):
try:
t = time.strptime(date,'%m%d%Y')
except:
pass
try:
t = time.strptime(date,'%m%d%y')
except:
pass
try:
t = time.strptime(date,'%b%Y')
lastday = calendar.monthrange(int(t.tm_year), int(t.tm_mon))[1]
t = time.strptime(date + str(lastday),'%b%Y%d')
except:
pass
try:
t = time.strptime(date,'%m_%y')
lastday = calendar.monthrange(int(t.tm_year), int(t.tm_mon))[1]
t = time.strptime(date + str(lastday),'%m_%y%d')
except:
pass
try:
t = time.strptime(date,'%Y-%m-%d')
except:
pass
try:
r = time.strftime("%Y%m%d",t)
return r
except:
pass
return date
test_str = re.sub(p,lambda m: changedate(m.group(2)), test_str)
print test_str
输入
Format Example Input String
MMDDYYYY foo.bar.02242015.txt
MMDDYY foo.bar.022415.txt
MONCCYY foo.bar.FEB2015.txt
YYYY-MM-DD foo_bar_2015-02-01_2015-02-28.txt
YYYYMMDD foo_bar_20150224.txt
MM_YY foo_bar_02_15.txt
YYYYMMDD foo_bar_20150224.txt
输出:
Format Example Input String
MMDDYYYY foo.bar.20150224.txt
MMDDYY foo.bar.20150224.txt
MONCCYY foo.bar.20150228.txt
YYYY-MM-DD foo_bar_20150228.txt
YYYYMMDD foo_bar_20150224.txt
MM_YY foo_bar_20150228.txt
YYYYMMDD foo_bar_20150224.txt
<强>解释强>:
E.g。
输入
foo_bar_2015-02-01_2015-02-28.txt
所以
(?<=\.|_)([A-Z\d+_-]*?([A-Z\d+_-]{0,10}))(?=\.)
正则表达式将日期字符串捕获到组m
1. [182-203] `2015-02-01_2015-02-28`
2. [193-203] `2015-02-28`
m.group(0) = 2015-02-01_2015-02-28
m.group(1) = 2015-02-01_2015-02-28
m.group(2) = 2015-02-28
然后
lambda m: changedate(m.group(2))
重新格式化日期时间
所以
2015-02-28
无法传递其他人
try:
t = time.strptime(date,'%m%d%Y')
except:
pass
但只传递这个块
try:
r = time.strftime("%Y-%m-%d",t)
return r
except:
pass
然后格式化
try:
r = time.strftime("%Y%m%d",t)
return r
except:
pass