用熊猫解析日期:如何考虑时区?

时间:2018-01-23 08:30:27

标签: python pandas timezone strftime date-parsing

我有这些格式的日期:

Thursday, September 22, 2016 at 11:04am UTC+02
Monday, January 22, 2018 at 6:46pm CST
...

我想将它们转换为UNIX时间戳。这种模式有效,但忽略了时区:

timestamp = pd.to_datetime(date, format='%A, %B %d, %Y at %H:%M%p', exact=False)

我不知道如何考虑时区(“UTC + 02,”CST“)。

这不起作用:

timestamp = pd.to_datetime(date, format='%A, %B %d, %Y at %H:%M%p %Z')
# ValueError: unconverted data remains: +02

2 个答案:

答案 0 :(得分:0)

# ValueError: unconverted data remains: +02是因为您在使用strptime时应该解析整个日期字符串,您将离开%z部分。但您无法在%z中使用strptime,请参阅ISO to datetime object: 'z' is a bad directive

所以也许你可以对你的数据进行某种映射:

timestamp = date.map(lambda x : dateutil.parser.parse(x))

答案 1 :(得分:0)

我知道您要求提供Pandas解决方案,但dateutil正确处理您的字符串:

import dateutil
from dateutil.tz import gettz

samples = ['Thursday, September 22, 2016 at 11:04am UTC+02',
           'Monday, January 22, 2018 at 6:46pm CST']

# American time zone abbreviations
tzinfos = {'HAST': gettz('Pacific/Honolulu'),
           'AKST': gettz('America/Anchorage'),
           'PST': gettz('America/Los Angeles'),
           'MST': gettz('America/Phoenix'),
           'CST': gettz('America/Chicago'),
           'EST': gettz('America/New York'),
          }

for s in samples:
    parsed = dateutil.parser.parse(s, fuzzy=True, tzinfos=tzinfos)
    print(s, '->', parsed)

输出:

Thursday, September 22, 2016 at 11:04am UTC+02 -> 2016-09-22 11:04:00-02:00
Monday, January 22, 2018 at 6:46pm CST -> 2018-01-22 18:46:00-06:00