Python将日期时间索引(复数索引)转换为多个日期时间范围

时间:2020-05-13 07:37:03

标签: python datetime indexing

我有一长串的Datetime索引:

index_list=DatetimeIndex(['2019-08-19 00:00:00', '2019-08-19 00:01:00',
               '2019-08-19 00:02:00', '2019-08-19 00:03:00',
               '2019-08-19 00:04:00', '2019-08-19 00:05:00',
               '2019-08-19 00:06:00', '2019-08-19 00:07:00',
               '2019-08-19 00:08:00', '2019-08-19 00:09:00',
               ...
               '2020-05-08 23:50:00', '2020-05-08 23:51:00',
               '2020-05-08 23:52:00', '2020-05-08 23:53:00',
               '2020-05-08 23:54:00', '2020-05-08 23:55:00',
               '2020-05-08 23:56:00', '2020-05-08 23:57:00',
               '2020-05-08 23:58:00', '2020-05-08 23:59:00'],
          dtype='datetime64[ns]', name='phenomenon_time', length=28037, freq=None)

基本时差为1分钟:

 Timedelta('0 days 00:01:00')

我想知道此索引中的范围。 例如,在中间:

DatetimeIndex(['2019-08-24 23:54:00', '2019-08-24 23:55:00',
               '2019-08-24 23:56:00', '2019-08-24 23:57:00',
               '2019-08-24 23:58:00', '2019-08-24 23:59:00',
               '2019-08-26 23:00:00', '2019-08-26 23:01:00',
               '2019-08-26 23:02:00', '2019-08-26 23:03:00'],
              dtype='datetime64[ns]', name='phenomenon_time', freq=None)
如您所见,

之间存在时间间隔。由此,我希望得到两个范围:

'2019-08-24 23:54:00' - '2019-08-24 23:59:00'

'2019-08-26 23:00:00' - '2019-08-26 23:03:00'

我不知道有多少范围。 简单的方法是从头到尾都有一个循环,使用current和next检查每个迭代:

temptime=index_list[0]
for current, next in zip(index_list, index_list[1:]):
    if next-current>datetime.timedelta(minutes=1):
        print (str(temptime) + ' - ' +str(current))
        temptime=next

我想知道是否还有更Python化的方法? 我不在乎获取datetimerange对象,或者只是字符串列表。只要我可以将其导出到python外。

1 个答案:

答案 0 :(得分:1)

这是您需要的吗?

import pandas as pd

# example df
idx = pd.DatetimeIndex(['2019-08-19 00:00:00', '2019-08-19 00:01:00',
                        '2019-08-19 00:02:00', '2019-08-19 00:03:00',
                        '2019-08-19 00:04:00', '2019-08-19 00:06:00',
                        '2019-08-19 00:07:00', '2019-08-19 00:12:00',
                        '2019-08-19 00:25:00', '2019-08-19 00:30:00',
                        '2019-08-19 00:31:00', '2019-08-19 00:32:00'],
                       dtype='datetime64[ns]', name='phenomenon_time', freq=None)

s = idx.to_series() # cast to Series so we can use .diff()

# start is whenever diff to previous is > 1 min. use boolean mask to get resp. entries from s.
# need to prepend first entry of the series (iloc[0]) since diff won't catch that.
starts = pd.Series([s.iloc[0]] + s[s.diff() > '1min'].to_list())
# starts
# 0   2019-08-19 00:00:00
# 1   2019-08-19 00:06:00
# 2   2019-08-19 00:12:00
# 3   2019-08-19 00:25:00
# 4   2019-08-19 00:30:00

# to get the ends of the periods, shift the mask by one.
# need to add last entry of the series (iloc[-1]) since diff won't catch that either.
ends = pd.Series(s[(s.diff() > '1min').shift(periods=-1).fillna(False)].to_list() + [s.iloc[-1]])
# ends
# 0   2019-08-19 00:04:00
# 1   2019-08-19 00:07:00
# 2   2019-08-19 00:12:00
# 3   2019-08-19 00:25:00
# 4   2019-08-19 00:32:00