我有一长串的Datetime索引:
index_list=DatetimeIndex(['2019-08-19 00:00:00', '2019-08-19 00:01:00',
'2019-08-19 00:02:00', '2019-08-19 00:03:00',
'2019-08-19 00:04:00', '2019-08-19 00:05:00',
'2019-08-19 00:06:00', '2019-08-19 00:07:00',
'2019-08-19 00:08:00', '2019-08-19 00:09:00',
...
'2020-05-08 23:50:00', '2020-05-08 23:51:00',
'2020-05-08 23:52:00', '2020-05-08 23:53:00',
'2020-05-08 23:54:00', '2020-05-08 23:55:00',
'2020-05-08 23:56:00', '2020-05-08 23:57:00',
'2020-05-08 23:58:00', '2020-05-08 23:59:00'],
dtype='datetime64[ns]', name='phenomenon_time', length=28037, freq=None)
基本时差为1分钟:
Timedelta('0 days 00:01:00')
我想知道此索引中的范围。 例如,在中间:
DatetimeIndex(['2019-08-24 23:54:00', '2019-08-24 23:55:00',
'2019-08-24 23:56:00', '2019-08-24 23:57:00',
'2019-08-24 23:58:00', '2019-08-24 23:59:00',
'2019-08-26 23:00:00', '2019-08-26 23:01:00',
'2019-08-26 23:02:00', '2019-08-26 23:03:00'],
dtype='datetime64[ns]', name='phenomenon_time', freq=None)
如您所见,之间存在时间间隔。由此,我希望得到两个范围:
'2019-08-24 23:54:00' - '2019-08-24 23:59:00'
和
'2019-08-26 23:00:00' - '2019-08-26 23:03:00'
我不知道有多少范围。 简单的方法是从头到尾都有一个循环,使用current和next检查每个迭代:
temptime=index_list[0]
for current, next in zip(index_list, index_list[1:]):
if next-current>datetime.timedelta(minutes=1):
print (str(temptime) + ' - ' +str(current))
temptime=next
我想知道是否还有更Python化的方法? 我不在乎获取datetimerange对象,或者只是字符串列表。只要我可以将其导出到python外。
答案 0 :(得分:1)
这是您需要的吗?
import pandas as pd
# example df
idx = pd.DatetimeIndex(['2019-08-19 00:00:00', '2019-08-19 00:01:00',
'2019-08-19 00:02:00', '2019-08-19 00:03:00',
'2019-08-19 00:04:00', '2019-08-19 00:06:00',
'2019-08-19 00:07:00', '2019-08-19 00:12:00',
'2019-08-19 00:25:00', '2019-08-19 00:30:00',
'2019-08-19 00:31:00', '2019-08-19 00:32:00'],
dtype='datetime64[ns]', name='phenomenon_time', freq=None)
s = idx.to_series() # cast to Series so we can use .diff()
# start is whenever diff to previous is > 1 min. use boolean mask to get resp. entries from s.
# need to prepend first entry of the series (iloc[0]) since diff won't catch that.
starts = pd.Series([s.iloc[0]] + s[s.diff() > '1min'].to_list())
# starts
# 0 2019-08-19 00:00:00
# 1 2019-08-19 00:06:00
# 2 2019-08-19 00:12:00
# 3 2019-08-19 00:25:00
# 4 2019-08-19 00:30:00
# to get the ends of the periods, shift the mask by one.
# need to add last entry of the series (iloc[-1]) since diff won't catch that either.
ends = pd.Series(s[(s.diff() > '1min').shift(periods=-1).fillna(False)].to_list() + [s.iloc[-1]])
# ends
# 0 2019-08-19 00:04:00
# 1 2019-08-19 00:07:00
# 2 2019-08-19 00:12:00
# 3 2019-08-19 00:25:00
# 4 2019-08-19 00:32:00