以小时为单位的日期差异(Excel数据导入)?

时间:2016-04-19 19:12:35

标签: python pandas datetime dataframe timestamp

我需要计算两个日期之间的小时差异(格式:年 - 月 - 日THH:MM:SS我也可能将数据格式从巨大的excel文件转换为(格式:年 - 月 - 日HH:MM:SS)在Python中最有效的方法是什么?我曾尝试使用Datatime / Time对象(TypeError:期望的字符串或缓冲区),Timestamp(ValueError)和DataFrame(不给出小时结果)。

Excel文件:

Order_Date             Received_Customer   Column3
2000-10-06T13:00:58    2000-11-06T13:00:58    1
2000-10-21T15:40:15    2000-12-27T10:09:29    2
2000-10-23T10:09:29    2000-10-26T10:09:29    3
.....                  ....

数据时间/时间对象代码(TypeError:期望的字符串或缓冲区):

import pandas as pd
import time as t

data=pd.read_excel('/path/file.xlsx')

s1 = (data,['Order_Date'])
s2 = (data,['Received_Customer'])

s1Time = t.strptime(s1, "%Y:%m:%d:%H:%M:%S")
s2Time = t.strptime(s2, "%Y:%m:%d:%H:%M:%S")

deltaInHours = (t.mktime(s2Time) - t.mktime(s1Time))

print deltaInHours, "hours"

时间戳(ValueError)代码:

import pandas as pd
import datetime as dt

data=pd.read_excel('/path/file.xlsx')

df = pd.DataFrame(data,columns=['Order_Date','Received_Customer'])
df.to = [pd.Timestamp('Order_Date')]
df.fr = [pd.Timestamp('Received_Customer')]
(df.fr-df.to).astype('timedelta64[h]')

DataFrame(不会返回所需的结果)

import pandas as pd

data=pd.read_excel('/path/file.xlsx')

df = pd.DataFrame(data,columns=['Order_Date','Received_Customer'])

df['Order_Date'] = pd.to_datetime(df['Order_Date'])
df['Received_Customer'] = pd.to_datetime(df['Received_Customer'])

answer = df.dropna()['Order_Date'] - df.dropna()['Received_Customer']

answer.astype('timedelta64[h]')

print(answer)

输出:

0   24 days 16:38:07
1    0 days 00:00:00
2   20 days 12:39:52
dtype: timedelta64[ns]

应该是这样的:

0   592 hour
1   0   hour
2   492 hour

还有另一种方法可以将timedelta64[ns]转换为小时而不是answer.astype('timedelta64[h]')吗?

1 个答案:

答案 0 :(得分:1)

对于每个解决方案,您都混合了数据类型和方法。虽然我没有时间明确解释你的错误,但我想通过提供(可能是非最佳的)解决方案来帮助你。 我根据您之前的尝试构建了解决方案,并将其与其他问题的知识相结合,例如:

Convert a timedelta to days, hours and minutes

Get total number of hours from a Pandas Timedelta?

请注意,我使用的是Python 3.我希望我的解决方案可以指导您的方式。我的解决方案就是这个:

import pandas as pd
from datetime import  datetime
import numpy as np

d = pd.read_excel('C:\\Users\\nrieble\\Desktop\\check.xlsx',header=0)

start = [pd.to_datetime(e) for e in data['Order_Date'] if len(str(e))>4]
end = [pd.to_datetime(e) for e in data['Received_Customer'] if len(str(e))>4]

delta = np.asarray(s2Time)-np.asarray(s1Time)
deltainhours = [e/np.timedelta64(1, 'h') for e in delta]

print (deltainhours, "hours")