Question

我有一个这样的数据框：

index = ['2018-02-17 00:30:00', '2018-02-17 07:00:00',
'2018-02-17 13:00:00', '2018-02-17 19:00:00',
'2018-02-18 00:00:00', '2018-02-18 07:00:00',
'2018-02-18 10:30:00', '2018-02-18 13:00:00']

df = pd.DataFrame({'col': list(range(len(index)))})
df.index = pd.to_datetime(index)

                     col
2018-02-17 00:30:00    0
2018-02-17 07:00:00    1
2018-02-17 13:00:00    2
2018-02-17 19:00:00    3
2018-02-18 00:00:00    4
2018-02-18 07:00:00    5
2018-02-18 10:30:00    6
2018-02-18 13:00:00    7

并且想要添加一个反映实际持续时间的列，以小时为单位，所以我想要的结果如下：

                     col  time_range
2018-02-17 00:30:00    0         0.0
2018-02-17 07:00:00    1         6.5
2018-02-17 13:00:00    2        12.5
2018-02-17 19:00:00    3        18.5
2018-02-18 00:00:00    4        23.5
2018-02-18 07:00:00    5        30.5
2018-02-18 10:30:00    6        34.0
2018-02-18 13:00:00    7        36.5

我目前这样做如下：

df['time_range'] = [(ti - df.index[0]).delta / (10 ** 9 * 60 * 60) for ti in df.index]

是否有更智能（即矢量化/内置）方式？

Answer 1

使用：

df['new'] = (df.index - df.index[0]).total_seconds() / 3600

或者：

df['new'] = (df.index - df.index[0]) / np.timedelta64(1, 'h')

print (df)
                     col   new
2018-02-17 00:30:00    0   0.0
2018-02-17 07:00:00    1   6.5
2018-02-17 13:00:00    2  12.5
2018-02-17 19:00:00    3  18.5
2018-02-18 00:00:00    4  23.5
2018-02-18 07:00:00    5  30.5
2018-02-18 10:30:00    6  34.0
2018-02-18 13:00:00    7  36.5

如何将日期时间序列转换为实际持续时间（小时）？

1 个答案: