在pandas中的指定时间间隔内为每个连续小时生成数据范围

时间:2017-09-05 14:13:53

标签: python pandas dataframe

我正在尝试使用Pandas在预定义的一组日期中为每小时生成一组间隔。我用过:

import pandas as pd

print pd.date_range(start='2013-04-01', end='2013-04-30', freq='1H')

DatetimeIndex(['2013-04-01 00:00:00', '2013-04-01 01:00:00',
               '2013-04-01 02:00:00', '2013-04-01 03:00:00',
               '2013-04-01 04:00:00', '2013-04-01 05:00:00',
               '2013-04-01 06:00:00', '2013-04-01 07:00:00',
               '2013-04-01 08:00:00', '2013-04-01 09:00:00',
               ...
               '2013-04-29 15:00:00', '2013-04-29 16:00:00',
               '2013-04-29 17:00:00', '2013-04-29 18:00:00',
               '2013-04-29 19:00:00', '2013-04-29 20:00:00',
               '2013-04-29 21:00:00', '2013-04-29 22:00:00',
               '2013-04-29 23:00:00', '2013-04-30 00:00:00'],
              dtype='datetime64[ns]', length=697, freq='H')

然而,它每隔一小时生成一个间隔,即[0-1],[2-3],[4-5],......但是,我需要的是像[0-1]这样的分区],[1-2],[2-3],......我怎么能这样做?提前致谢。

期望的输出:

DatetimeIndex(['2013-04-01 00:00:00', '2013-04-01 01:00:00',
               '2013-04-01 01:00:00', '2013-04-01 02:00:00',
               '2013-04-01 02:00:00', '2013-04-01 03:00:00',
               '2013-04-01 03:00:00', '2013-04-01 04:00:00',
               '2013-04-01 04:00:00', '2013-04-01 05:00:00',
               ...
               '2013-04-29 19:00:00', '2013-04-29 20:00:00',
               '2013-04-29 20:00:00', '2013-04-29 21:00:00',
               '2013-04-29 21:00:00', '2013-04-29 22:00:00',
               '2013-04-29 22:00:00', '2013-04-29 23:00:00',
               '2013-04-29 23:00:00', '2013-04-30 00:00:00'],
              dtype='datetime64[ns]', length=697, freq='H')

2 个答案:

答案 0 :(得分:1)

这是单程

In [2249]: d = pd.date_range(start='2013-04-01', end='2013-04-30', freq='H')

In [2250]: pd.DatetimeIndex([v for p in zip(d, d[1:]) for v in p])
Out[2250]:
DatetimeIndex(['2013-04-01 00:00:00', '2013-04-01 01:00:00',
               '2013-04-01 01:00:00', '2013-04-01 02:00:00',
               '2013-04-01 02:00:00', '2013-04-01 03:00:00',
               '2013-04-01 03:00:00', '2013-04-01 04:00:00',
               '2013-04-01 04:00:00', '2013-04-01 05:00:00',
               ...
               '2013-04-29 19:00:00', '2013-04-29 20:00:00',
               '2013-04-29 20:00:00', '2013-04-29 21:00:00',
               '2013-04-29 21:00:00', '2013-04-29 22:00:00',
               '2013-04-29 22:00:00', '2013-04-29 23:00:00',
               '2013-04-29 23:00:00', '2013-04-30 00:00:00'],
              dtype='datetime64[ns]', length=1392, freq=None)

或者,

In [2252]: pd.DatetimeIndex(itertools.chain(*zip(d, d[1:])))
Out[2252]:
DatetimeIndex(['2013-04-01 00:00:00', '2013-04-01 01:00:00',
               '2013-04-01 01:00:00', '2013-04-01 02:00:00',
               '2013-04-01 02:00:00', '2013-04-01 03:00:00',
               '2013-04-01 03:00:00', '2013-04-01 04:00:00',
               '2013-04-01 04:00:00', '2013-04-01 05:00:00',
               ...
               '2013-04-29 19:00:00', '2013-04-29 20:00:00',
               '2013-04-29 20:00:00', '2013-04-29 21:00:00',
               '2013-04-29 21:00:00', '2013-04-29 22:00:00',
               '2013-04-29 22:00:00', '2013-04-29 23:00:00',
               '2013-04-29 23:00:00', '2013-04-30 00:00:00'],
              dtype='datetime64[ns]', length=1392, freq=None)

答案 1 :(得分:1)

单行,直接执行:

In [237]: pd.date_range(start='2013-04-01', end='2013-04-30', freq='0.5H1U').round('1H')
Out[237]: 
DatetimeIndex(['2013-04-01 00:00:00', '2013-04-01 01:00:00',
               '2013-04-01 01:00:00', '2013-04-01 02:00:00',
               '2013-04-01 02:00:00', '2013-04-01 03:00:00',
               '2013-04-01 03:00:00', '2013-04-01 04:00:00',
               '2013-04-01 04:00:00', '2013-04-01 05:00:00',
               ...
               '2013-04-29 19:00:00', '2013-04-29 20:00:00',
               '2013-04-29 20:00:00', '2013-04-29 21:00:00',
               '2013-04-29 21:00:00', '2013-04-29 22:00:00',
               '2013-04-29 22:00:00', '2013-04-29 23:00:00',
               '2013-04-29 23:00:00', '2013-04-30 00:00:00'],
              dtype='datetime64[ns]', length=1392, freq=None)

我使用的频率为半小时加上一毫秒,因此四舍五入总是落在"右侧"。