我有一个数据框' DTime'包含日期和列的列时间数据:
01JAN2004 00:00-01:00
01JAN2004 01:00-02:00
我尝试使用以下方法解析:
pd.to_datetime(df['DTime'], format='%d%b%Y %H:%M-%H:%M')
但是这给了:
error: redefinition of group name 'H' as group 6; was group 4
我试过删除" -H:M"但是这会产生未转换的数据错误。
有没有办法做到这一点并将时基设置为第一个给定的小时?
答案 0 :(得分:2)
df = pd.DataFrame([
'01JAN2004 00:00-01:00',
'01JAN2004 01:00-02:00'
], columns=['dstr'])
date_regex = '(?P<date>\d\d\w{3}\d{4})'
beg_regex = '(?P<beg_hour>\d\d):(?P<beg_min>\d\d)'
end_regex = '(?P<end_hour>\d\d):(?P<end_min>\d\d)'
regex = '{} {}-{}'.format(date_regex, beg_regex, end_regex)
d1 = df.dstr.str.extract(regex, expand=True)
for c in ['beg_hour', 'beg_min', 'end_hour', 'end_min']:
d1[c] = d1[c].astype(int)
pd.concat([
pd.to_datetime(d1.date, format='%d%b%Y') + \
pd.to_timedelta(d1.beg_hour, unit='H'),
pd.to_datetime(d1.date, format='%d%b%Y') + \
pd.to_timedelta(d1.beg_hour, unit='H')
], axis=1, keys=['Beg', 'End'])