日期时间转换ValueError熊猫

时间:2019-10-24 04:08:12

标签: python pandas datetime

我有错误时间(24:00:00到26:18:00)的数据集,我想知道用python处理此类数据的最佳方法是什么。

我尝试使用以下代码将列从对象转换为datetime

stopTimeArrDep['departure_time'] =  pd.to_datetime(stopTimeArrDep['departure_time']\
                                                   ,format='%H:%M:%S')

但是我得到这个错误:

ValueError: time data '24:04:00' does not match format '%H:%M:%S' (match)

因此,我尝试添加errors='coerce'来避免此错误。但是我最终将空列和不需要的日期添加到每一行。

stopTimeArrDep['departure_time'] =  pd.to_datetime(stopTimeArrDep['departure_time']\
                                                   ,format='%H:%M:%S',errors='coerce')

输出样本:

original_col    converted_col
23:45:00        1/1/00 23:45:00
23:51:00        1/1/00 23:51:00
24:04:00
23:42:00        1/1/00 23:42:00
26:01:00

关于什么是解决此问题的最佳方法的任何建议。谢谢

1 个答案:

答案 0 :(得分:0)

解决方案

如果可以的话,您可以将original_col视为某个经过的时间间隔,而不是时间。您可以使用datetime.timedelta,然后将此datetime.timedelta添加到datetime.datetime以获得一些日期时间对象;您最终可以用来分别获取日期和时间。

示例

from datetime import datetime, timedelta

time_string = "20:30:20"

t = datetime.utcnow()
print('t: {}'.format(t))
HH, MM, SS = [int(x) for x in time_string.split(':')]
dt = timedelta(hours=HH, minutes=MM, seconds=SS)
print('dt: {}'.format(dt))
t2 = t + dt
print('t2: {}'.format(t2))
print('t2.date: {} | t2.time: {}'.format(str(t2.date()), str(t2.time()).split('.')[0]))

输出

t: 2019-10-24 04:43:08.255027
dt: 20:30:20
t2: 2019-10-25 01:13:28.255027
t2.date: 2019-10-25 | t2.time: 01:13:28
  

用于您的用例

# Define Custom Function
def process_row(time_string):
    HH, MM, SS = [int(x) for x in time_string.split(':')]
    dt = timedelta(hours=HH, minutes=MM, seconds=SS)
    return dt

# Make Dummy Data
original_col = ["23:45:00", "23:51:00", "24:04:00", "23:42:00", "26:01:00"]
df = pd.DataFrame({'original_col': original_col, 'dt': None})

# Process Dataframe
df['dt'] = df.apply(lambda x: process_row(x['original_col']), axis=1)
df['t'] = datetime.utcnow()
df['t2'] = df['dt'] + df['t']
# extracting date from timestamp
df['Date'] = [datetime.date(d) for d in df['t2']] 
# extracting time from timestamp
df['Time'] = [datetime.time(d) for d in df['t2']] 
df

输出
enter image description here

使用pandas.to_datetime()

pd.to_datetime(df['t2'], format='%H:%M:%S',errors='coerce')

输出

0   2019-10-25 09:38:39.349410
1   2019-10-25 09:44:39.349410
2   2019-10-25 09:57:39.349410
3   2019-10-25 09:35:39.349410
4   2019-10-25 11:54:39.349410
Name: t2, dtype: datetime64[ns]

参考

  1. How to construct a timedelta object from a simple string