熊猫数据框日期时间索引上的25-23小时

时间:2019-08-22 17:10:52

标签: python pandas dataframe

我有一个由datetimeindex索引的熊猫数据框。索引的频率是可变的,但主要是基于分钟的采样。

由于数据库问题,在索引上没有适当地节省日光节约时间。因此,在特定的月份/日期,我有重复的索引值。有没有办法(不使用时区)处理熊猫的23-25小时的工作日,以便我可以对记录进行线性跟踪?

这是我的问题的一个小例子:

DatetimeIndex(['2014-03-12 22:59:59', '2014-03-12 22:59:59',
           '2014-03-12 23:00:59', '2014-03-12 23:00:59',
           '2014-03-12 23:01:59', '2014-03-12 23:02:59',
           '2014-03-12 23:02:59', '2014-03-12 23:03:59',
           '2014-03-12 23:03:59', '2014-03-12 23:04:59',
           '2014-03-12 23:04:59', '2014-03-12 23:05:59',
           '2014-03-12 23:06:59', '2014-03-12 23:06:59',
           '2014-03-12 23:07:59', '2014-03-12 23:07:59',
           '2014-03-12 23:08:59', '2014-03-12 23:09:59',
           '2014-03-12 23:09:59', '2014-03-12 23:10:59',
           '2014-03-12 23:10:59', '2014-03-12 23:11:59',
           '2014-03-12 23:11:59', '2014-03-12 23:12:59',
           '2014-03-12 23:13:59', '2014-03-12 23:13:59',
           '2014-03-12 23:14:59', '2014-03-12 23:14:59',
           '2014-03-12 23:15:59', '2014-03-12 23:16:59',
           '2014-03-12 23:16:59', '2014-03-12 23:17:59',
           '2014-03-12 23:17:59', '2014-03-12 23:18:59',
           '2014-03-12 23:19:59', '2014-03-12 23:19:59',
           '2014-03-12 23:20:59', '2014-03-12 23:20:59',
           '2014-03-12 23:21:59', '2014-03-12 23:22:59',
           '2014-03-12 23:22:59', '2014-03-12 23:23:59',
           '2014-03-12 23:24:59', '2014-03-12 23:24:59',
           '2014-03-12 23:25:59', '2014-03-12 23:26:59',
           '2014-03-12 23:26:59', '2014-03-12 23:27:59',
           '2014-03-12 23:27:59', '2014-03-12 23:28:59',
           '2014-03-12 23:28:59', '2014-03-12 23:29:59',
           '2014-03-12 23:30:59', '2014-03-12 23:30:59',
           '2014-03-12 23:31:59', '2014-03-12 23:31:59',
           '2014-03-12 23:32:59', '2014-03-12 23:33:59',
           '2014-03-12 23:33:59', '2014-03-12 23:34:59',
           '2014-03-12 23:34:59', '2014-03-12 23:35:59',
           '2014-03-12 23:36:59', '2014-03-12 23:36:59',
           '2014-03-12 23:37:59', '2014-03-12 23:38:59',
           '2014-03-12 23:38:59', '2014-03-12 23:39:59',
           '2014-03-12 23:40:59', '2014-03-12 23:40:59',
           '2014-03-12 23:41:59', '2014-03-12 23:42:59',
           '2014-03-12 23:42:59', '2014-03-12 23:43:59',
           '2014-03-12 23:44:59', '2014-03-12 23:44:59',
           '2014-03-12 23:45:59', '2014-03-12 23:46:59',
           '2014-03-12 23:46:59', '2014-03-12 23:47:59',
           '2014-03-12 23:48:59', '2014-03-12 23:48:59',
           '2014-03-12 23:49:59', '2014-03-12 23:49:59',
           '2014-03-12 23:50:59', '2014-03-12 23:51:59',
           '2014-03-12 23:51:59', '2014-03-12 23:52:59',
           '2014-03-12 23:52:59', '2014-03-12 23:54:59',
           '2014-03-12 23:56:59', '2014-03-12 23:58:59',
           '2014-03-12 23:54:00', '2014-03-12 23:55:59',
           '2014-03-12 23:56:59', '2014-03-12 23:57:59',
           '2014-03-12 23:59:59'],
          dtype='datetime64[ns]', name='Timestamp', freq=None)  

1 个答案:

答案 0 :(得分:0)

您的问题是日期索引是不可变的,因此您无法进行就地操作来修改它们,而必须将其覆盖。

一种解决方案可能是“展开”索引,使其仍具有相同数量的时间步长,但每隔一个时间戳就将向前/向后推一个小时。

我在OP中将您的索引称为index

import pandas as pd
df = pd.DataFrame(index=index)

first_step = df.index[::2] # every second index

## shift everyone forward starting from the second value, grab every second value ##

second_step = df.index[1::2].shift(periods=1,freq='1H')

new_index = first_step.append(second_step)

df.index = new_index

我不禁感到奇怪,请告诉我是否有帮助。