我有一个具有非静态日期结构的数据集
例如
Fri, 13 Apr 2018 13:13:12 +0000 (UTC)
Mon, 26 Mar 2018 06:32:59 +0100
Tue, 05 Dec 2017 11:03:34 GMT
08 Dec 2016 12:00:24
如何使用正则表达式从字符串中获取日期,(小时+偏移量)和分钟,而无需手动代码。
答案 0 :(得分:10)
使用timestring
:
import timestring
dt_1 = "Fri, 13 Apr 2018 13:13:12 +0000 (UTC)"
dt_2 = "Mon, 26 Mar 2018 06:32:59 +0100"
dt_3 = "Tue, 05 Dec 2017 11:03:34 GMT"
dt_4 = "08 Dec 2016 12:00:24"
print(timestring.Date(dt_1))
print(timestring.Date(dt_2))
print(timestring.Date(dt_3))
print(timestring.Date(dt_4))
编辑:
当我这样做时,这是另一个更酷的方法:
使用dparser
:
import dateutil.parser as dparser
dt_1 = "Fri, 13 Apr 2018 13:13:12 +0000 (UTC)"
dt_2 = "Mon, 26 Mar 2018 06:32:59 +0100"
dt_3 = "Tue, 05 Dec 2017 11:03:34 GMT"
dt_4 = "08 Dec 2016 12:00:24"
print(dparser.parse(dt_1,fuzzy=True))
print(dparser.parse(dt_2,fuzzy=True))
print(dparser.parse(dt_3,fuzzy=True))
print(dparser.parse(dt_4,fuzzy=True))
输出:
2018-04-13 13:13:12+00:00
2018-03-26 06:32:59+01:00
2017-12-05 11:03:34+00:00
2016-12-08 12:00:24
编辑2:
为什么dparser
比较凉?
无效的日期引发ValueError:
invalid_dt = "Fri, 35 Apr 2018 13:13:12 +0000 (UTC)"
print(dparser.parse(invalid_dt,fuzzy=True))
输出:
ValueError: day is out of range for month
编辑3:
要获取day
,month
,year
,hour
,minute
或second
:
print(dparser.parse(dt_1,fuzzy=True).day) # 13
print(dparser.parse(dt_2,fuzzy=True).month) # 3
print(dparser.parse(dt_3,fuzzy=True).year) # 2017
print(dparser.parse(dt_4,fuzzy=True).hour) # 12
print(dparser.parse(dt_4,fuzzy=True).minute) # 0
print(dparser.parse(dt_4,fuzzy=True).second) # 24
编辑4:
如果要获取日期名称:
print(datetime.date(dparser.parse(dt_1,fuzzy=True)).strftime("%a")) # Fri