我需要计算两个日期之间的小时差异(格式:年 - 月 - 日THH:MM:SS我也可能将数据格式从巨大的excel文件转换为(格式:年 - 月 - 日HH:MM:SS)在Python中最有效的方法是什么?我曾尝试使用Datatime / Time对象(TypeError:期望的字符串或缓冲区),Timestamp(ValueError)和DataFrame(不给出小时结果)。
Excel文件:
Order_Date Received_Customer Column3
2000-10-06T13:00:58 2000-11-06T13:00:58 1
2000-10-21T15:40:15 2000-12-27T10:09:29 2
2000-10-23T10:09:29 2000-10-26T10:09:29 3
..... ....
数据时间/时间对象代码(TypeError:期望的字符串或缓冲区):
import pandas as pd
import time as t
data=pd.read_excel('/path/file.xlsx')
s1 = (data,['Order_Date'])
s2 = (data,['Received_Customer'])
s1Time = t.strptime(s1, "%Y:%m:%d:%H:%M:%S")
s2Time = t.strptime(s2, "%Y:%m:%d:%H:%M:%S")
deltaInHours = (t.mktime(s2Time) - t.mktime(s1Time))
print deltaInHours, "hours"
时间戳(ValueError)代码:
import pandas as pd
import datetime as dt
data=pd.read_excel('/path/file.xlsx')
df = pd.DataFrame(data,columns=['Order_Date','Received_Customer'])
df.to = [pd.Timestamp('Order_Date')]
df.fr = [pd.Timestamp('Received_Customer')]
(df.fr-df.to).astype('timedelta64[h]')
DataFrame(不会返回所需的结果)
import pandas as pd
data=pd.read_excel('/path/file.xlsx')
df = pd.DataFrame(data,columns=['Order_Date','Received_Customer'])
df['Order_Date'] = pd.to_datetime(df['Order_Date'])
df['Received_Customer'] = pd.to_datetime(df['Received_Customer'])
answer = df.dropna()['Order_Date'] - df.dropna()['Received_Customer']
answer.astype('timedelta64[h]')
print(answer)
输出:
0 24 days 16:38:07
1 0 days 00:00:00
2 20 days 12:39:52
dtype: timedelta64[ns]
应该是这样的:
0 592 hour
1 0 hour
2 492 hour
还有另一种方法可以将timedelta64[ns]
转换为小时而不是answer.astype('timedelta64[h]')
吗?
答案 0 :(得分:1)
对于每个解决方案,您都混合了数据类型和方法。虽然我没有时间明确解释你的错误,但我想通过提供(可能是非最佳的)解决方案来帮助你。 我根据您之前的尝试构建了解决方案,并将其与其他问题的知识相结合,例如:
Convert a timedelta to days, hours and minutes
Get total number of hours from a Pandas Timedelta?
请注意,我使用的是Python 3.我希望我的解决方案可以指导您的方式。我的解决方案就是这个:
import pandas as pd
from datetime import datetime
import numpy as np
d = pd.read_excel('C:\\Users\\nrieble\\Desktop\\check.xlsx',header=0)
start = [pd.to_datetime(e) for e in data['Order_Date'] if len(str(e))>4]
end = [pd.to_datetime(e) for e in data['Received_Customer'] if len(str(e))>4]
delta = np.asarray(s2Time)-np.asarray(s1Time)
deltainhours = [e/np.timedelta64(1, 'h') for e in delta]
print (deltainhours, "hours")