我有这个DataFrame。
timestamp Val1
2020-04-02 06:44:00 NaN
2020-04-03 16:52:00 NaN
2020-04-03 16:53:00 NaN
2020-04-03 16:54:00 NaN
2020-04-03 16:55:00 NaN
2020-04-17 02:03:00 NaN
2020-04-17 02:04:00 NaN
2020-04-17 02:05:00 NaN
2020-04-17 02:06:00 NaN
然后,我尝试使用分钟顺序将各组分开。例如,我不能将相差超过1分钟的行分组。 因此输出将如下所示:
#Group 1
timestamp Val1
2020-04-02 06:44:00 NaN
#Group 2
timestamp Val1
2020-04-03 16:52:00 NaN
2020-04-03 16:53:00 NaN
2020-04-03 16:54:00 NaN
2020-04-03 16:55:00 NaN
#Group 3
timestamp Val1
2020-04-17 02:03:00 NaN
2020-04-17 02:04:00 NaN
2020-04-17 02:05:00 NaN
2020-04-17 02:06:00 NaN
现在,我可以获取所有数据的最小和最大数据。但是不喜欢我想要尝试的东西。
答案 0 :(得分:1)
取连续行之间的差异,并检查其是否超出所需的差异('1min'
)。采用此布尔系列的cumsum
会创建分组标签。我已将其分配到此处的一列中进行说明。
#df['timestamp'] = pd.to_datetime(df['timestamp'])
df['group'] = df['timestamp'].diff().gt('1min').cumsum()
timestamp Val1 group
0 2020-04-02 06:44:00 NaN 0
1 2020-04-03 16:52:00 NaN 1
2 2020-04-03 16:53:00 NaN 1
3 2020-04-03 16:54:00 NaN 1
4 2020-04-03 16:55:00 NaN 1
5 2020-04-17 02:03:00 NaN 2
6 2020-04-17 02:04:00 NaN 2
7 2020-04-17 02:05:00 NaN 2
8 2020-04-17 02:06:00 NaN 2