我有错误时间(24:00:00到26:18:00)的数据集,我想知道用python处理此类数据的最佳方法是什么。
我尝试使用以下代码将列从对象转换为datetime
:
stopTimeArrDep['departure_time'] = pd.to_datetime(stopTimeArrDep['departure_time']\
,format='%H:%M:%S')
但是我得到这个错误:
ValueError: time data '24:04:00' does not match format '%H:%M:%S' (match)
因此,我尝试添加errors='coerce'
来避免此错误。但是我最终将空列和不需要的日期添加到每一行。
stopTimeArrDep['departure_time'] = pd.to_datetime(stopTimeArrDep['departure_time']\
,format='%H:%M:%S',errors='coerce')
输出样本:
original_col converted_col
23:45:00 1/1/00 23:45:00
23:51:00 1/1/00 23:51:00
24:04:00
23:42:00 1/1/00 23:42:00
26:01:00
关于什么是解决此问题的最佳方法的任何建议。谢谢
答案 0 :(得分:0)
如果可以的话,您可以将original_col
视为某个经过的时间间隔,而不是时间。您可以使用datetime.timedelta
,然后将此datetime.timedelta
添加到datetime.datetime
以获得一些日期时间对象;您最终可以用来分别获取日期和时间。
from datetime import datetime, timedelta
time_string = "20:30:20"
t = datetime.utcnow()
print('t: {}'.format(t))
HH, MM, SS = [int(x) for x in time_string.split(':')]
dt = timedelta(hours=HH, minutes=MM, seconds=SS)
print('dt: {}'.format(dt))
t2 = t + dt
print('t2: {}'.format(t2))
print('t2.date: {} | t2.time: {}'.format(str(t2.date()), str(t2.time()).split('.')[0]))
输出:
t: 2019-10-24 04:43:08.255027
dt: 20:30:20
t2: 2019-10-25 01:13:28.255027
t2.date: 2019-10-25 | t2.time: 01:13:28
用于您的用例
# Define Custom Function
def process_row(time_string):
HH, MM, SS = [int(x) for x in time_string.split(':')]
dt = timedelta(hours=HH, minutes=MM, seconds=SS)
return dt
# Make Dummy Data
original_col = ["23:45:00", "23:51:00", "24:04:00", "23:42:00", "26:01:00"]
df = pd.DataFrame({'original_col': original_col, 'dt': None})
# Process Dataframe
df['dt'] = df.apply(lambda x: process_row(x['original_col']), axis=1)
df['t'] = datetime.utcnow()
df['t2'] = df['dt'] + df['t']
# extracting date from timestamp
df['Date'] = [datetime.date(d) for d in df['t2']]
# extracting time from timestamp
df['Time'] = [datetime.time(d) for d in df['t2']]
df
pandas.to_datetime()
:pd.to_datetime(df['t2'], format='%H:%M:%S',errors='coerce')
输出:
0 2019-10-25 09:38:39.349410
1 2019-10-25 09:44:39.349410
2 2019-10-25 09:57:39.349410
3 2019-10-25 09:35:39.349410
4 2019-10-25 11:54:39.349410
Name: t2, dtype: datetime64[ns]