我需要分别分析以规则间隔的时间序列包含的孔之间的记录。
例如以下定期间隔为6秒的时间序列,在00:24和00:54之间存在间隔:
2018-01-01 00:00:00 4.2
2018-01-01 00:00:06 4.1
2018-01-01 00:00:12 4.3
2018-01-01 00:00:18 3.4
2018-01-01 00:00:24 4.7
2018-01-01 00:00:54 3.3
2018-01-01 00:01:00 8.2
我需要分别分析以下两个组:
第一组:
2018-01-01 00:00:00 4.2
2018-01-01 00:00:06 4.1
2018-01-01 00:00:12 4.3
2018-01-01 00:00:18 3.4
2018-01-01 00:00:24 4.7
第二组:
2018-01-01 00:00:54 3.3
2018-01-01 00:01:00 8.2
巨大的数据集中包含多个孔,分析需要比较连续的组。
此处遵循一些代码来重现示例:
data_index = pd.DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 00:00:
06','2018-01-01 00:00:12','2018-01-01 00:00:18', '2018-01-01 00:00:24', '2018-01-01 00:00:54', '2018-01-01 00:01:00'])
data = [4.2, 4.1,4.3,3.4, 4.7, 3.3, 8.2]
df = pd.DataFrame(data_index, columns=['date'])
df['datetime'] = pd.to_datetime(df['date'])
df = df.set_index('datetime')
df.drop(['date'], axis=1, inplace=True)
df['data'] = data
答案 0 :(得分:2)
groups=( df.index.to_series().diff()>=pd.Timedelta(seconds=6) ).cumsum()+1
for i , group in df.groupby(groups):
print(group)
data
datetime
2018-01-01 00:00:00 4.2
2018-01-01 00:00:06 4.1
2018-01-01 00:00:18 3.4
2018-01-01 00:00:24 4.7
data
datetime
2018-01-01 00:00:54 3.3
2018-01-01 00:01:00 8.2
详细信息
print(groups)
datetime
2018-01-01 00:00:00 1
2018-01-01 00:00:06 1
2018-01-01 00:00:18 1
2018-01-01 00:00:24 1
2018-01-01 00:00:54 2
2018-01-01 00:01:00 2
Name: datetime, dtype: int64
要分析不同的数据框,可以将它们保存在字典中:
dfs={i:group for i,group in df.groupby(groups)}
print(dfs[1])
data
datetime
2018-01-01 00:00:00 4.2
2018-01-01 00:00:06 4.1
2018-01-01 00:00:18 3.4
2018-01-01 00:00:24 4.7
print(dfs[2])
data
datetime
2018-01-01 00:00:54 3.3
2018-01-01 00:01:00 8.2