Question

我有一个数据框，用于计算blockTime列，它是endDate和startDate之间的差异。它给我类似 0 days 01:45:00 的结果，但是我只需要几个小时就使用十进制数字。在这种情况下为1.75。

我的df如下：

import pandas as pd

data = {'endDate': ['01/10/2020 15:23', '01/10/2020 16:31', '01/10/2020 16:20', '01/10/2020 11:00'],
      'startDate': ['01/10/2020 13:38', '01/10/2020 14:49', '01/10/2020 14:30','01/10/2020 14:30']
      }

df = pd.DataFrame(data, columns = ['endDate','startDate'])

df['endDate'] = pd.to_datetime(df['endDate'])
df['startDate'] = pd.to_datetime(df['startDate'])

df['blockTime'] = (df['endDate'] - df['startDate'])

df = df.reindex(columns= ['startDate', 'endDate', 'blockTime'])

期望的结果将是一个数据帧，如下所示。注意，如果产生负值，则需要以某种方式将其突出显示为不正确。我认为-999可能是理想的选择。

startDate           endDate               blockTime                 desiredResult
2020-01-10 13:38:00 2020-01-10 15:23:00   0 days 01:45:00           1.75
2020-01-10 14:49:00 2020-01-10 16:31:00   0 days 01:42:00           1.70
2020-01-10 14:30:00 2020-01-10 16:20:00   0 days 01:50:00           1.83
2020-01-10 14:30:00 2020-01-10 11:00:00  -1 days +20:30:00          -999.00

Answer 1

这就是在打印数据框时表示timedelta对象的方式。如果您只想将小时数保存为float而不是整个timedelta对象，则timedelta对象具有total_seconds()函数，您可以这样使用：

def td2hours(tdobject):
    if tdobject.total_seconds() < 0:
        return -999
    return tdobject.total_seconds() / 3600

df['blockTime']= (df['endDate'] - df['startDate']).apply(td2hours)

或者，作为Gustavo suggested in the comments，您可以避免使用apply()。当您拥有大型数据集时，这会更快：

blockTime = ((df['endDate'] - df['startDate']).dt.total_seconds() / 3600).to_numpy()
blockTime[blockTime < 0] = -999
df['blockTime'] = blockTime

输出：

              endDate           startDate   blockTime
0 2020-01-10 15:23:00 2020-01-10 13:38:00    1.750000
1 2020-01-10 16:31:00 2020-01-10 14:49:00    1.700000
2 2020-01-10 16:20:00 2020-01-10 14:30:00    1.833333
3 2020-01-10 11:00:00 2020-01-10 14:30:00 -999.000000

将timedelta64 [ns]转换为十进制

1 个答案: