关注answer here和and here。我首先将数据帧更改为时间对象
data['start'] = pd.to_datetime(data_session['start'], format = '%H:%M:%S').dt.time
data['end'] = pd.to_datetime(data['end'], format = '%H:%M:%S').dt.time
data['minutes'] = (data['end'] - data['start']).dt.minutes
data['Hour'] = data['start'].dt.hour
我收到此错误:
Error:TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
我检查了数据框信息:
data.info()
start 10000 non-null object
end 10000 non-null object
该列仍然是对象类型。为什么它不能转换为datetime64?为什么我无法使用dt访问者访问它?
我的最后一次尝试是:
data['start'] = pd.to_datetime(data_session['start'], format = '%H:%M:%S')
data['end'] = pd.to_datetime(data['end'], format = '%H:%M:%S')
data['minutes'] = (data['end'] - data['start'])
data.info()
start 10000 non-null datetime64[ns]
end 10000 non-null datetime64[ns]
此解决方案部分,因为我得到了时差,但我的开始和结束列都包含了额外的日期。
e.g: 06:10:10 -> 1900-01-01 06:10:10
我的目标是:
答案 0 :(得分:1)
我认为需要转换to_timedelta
然后转换为分钟和小时:
data = pd.DataFrame({'end':['12:01:04','15:21:00'],
'start':['10:01:04','5:41:00']})
data['start'] = pd.to_timedelta(data['start'])
data['end'] = pd.to_timedelta(data['end'])
data['minutes'] = (data['end'] - data['start']).dt.total_seconds() / 60
data['Hour'] = data['start'].astype('timedelta64[h]').astype(int)
print (data)
end start minutes Hour
0 12:01:04 10:01:04 120 10
1 15:21:00 05:41:00 580 5
答案 1 :(得分:1)
这是使用operator.attrgetter
的一种方式。来自@jezrael的数据。
from operator import attrgetter
for col in ['start', 'end']:
data[col] = pd.to_timedelta(data[col])
data['minutes'] = (data['end'] - data['start']).apply(attrgetter('seconds')) / 60
data['hour'] = (data['start'].apply(attrgetter('seconds')) / 60**2).astype(int)
print(data)
end start minutes hour
0 12:01:04 10:01:04 120.0 10
1 15:21:00 05:41:00 580.0 5