将数据帧划分为多个(连续)时间序列

时间:2017-06-13 10:18:15

标签: python pandas

我正在进行一项实验,我在打开和关闭时测量阀门。我有限位开关指示完全打开和完全壁橱。关闭或打开时我只对数据感兴趣。 我的pandas数据集看起来像这样(简化):

Time                       Flow_A    Flow_B      Open closed            
2017-06-12 09:46:31.068    0.000933  295.933070  1    0
2017-06-12 09:46:31.660    0.287122  292.727820  1    0
2017-06-12 09:46:32.252    0.256170  288.869600  0    0
2017-06-12 09:46:32.844    0.052523  284.265850  0    0
2017-06-12 09:46:33.437    0.367495  278.394200  0    1
2017-06-12 09:46:34.029    1.956472  270.846450  0    1
2017-06-12 09:46:34.621    5.265860  260.768250  0    0
2017-06-12 09:46:35.214   12.328835  248.132450  0    0
2017-06-12 09:46:35.807   22.592590  232.688620  1    0
2017-06-12 09:46:36.400   35.768205  214.997420  1    0
2017-06-12 09:46:36.992   51.623265  195.298150  1    0
2017-06-12 09:46:37.584   70.855590  174.048000  1    0

我已经找到了如何通过python获得感兴趣的领域:

mask = (data['Open']==0 & (data['Port_2'] == 0)
data.loc[mask]

这会给我:

Time                       Flow_A    Flow_B      Open closed
2017-06-12 09:46:32.252    0.256170  288.869600  0    0
2017-06-12 09:46:32.844    0.052523  284.265850  0    0
2017-06-12 09:46:34.621    5.265860  260.768250  0    0
2017-06-12 09:46:35.214   12.328835  248.132450  0    0

问题是如何将其拆分/分组/分组/子集到两个连续的数据集中。时间段未知,日志条目之间的间隔不完全相同。我希望在面具中找到连续数据,但我不知道该怎么做。

1 个答案:

答案 0 :(得分:0)

我认为你需要:

mask = (data['Open']==0) & (data['closed'] == 0)
data.loc[mask, 'groups'] = mask.ne(mask.shift())[mask].cumsum()
print (data)
                    Time     Flow_A     Flow_B  Open  closed  groups
2017-06-12  09:46:31.068   0.000933  295.93307     1       0     NaN
2017-06-12  09:46:31.660   0.287122  292.72782     1       0     NaN
2017-06-12  09:46:32.252   0.256170  288.86960     0       0     1.0
2017-06-12  09:46:32.844   0.052523  284.26585     0       0     1.0
2017-06-12  09:46:33.437   0.367495  278.39420     0       1     NaN
2017-06-12  09:46:34.029   1.956472  270.84645     0       1     NaN
2017-06-12  09:46:34.621   5.265860  260.76825     0       0     2.0
2017-06-12  09:46:35.214  12.328835  248.13245     0       0     2.0
2017-06-12  09:46:35.807  22.592590  232.68862     1       0     NaN
2017-06-12  09:46:36.400  35.768205  214.99742     1       0     NaN
2017-06-12  09:46:36.992  51.623265  195.29815     1       0     NaN
2017-06-12  09:46:37.584  70.855590  174.04800     1       0     NaN

print (data[mask])
                    Time     Flow_A     Flow_B  Open  closed  groups
2017-06-12  09:46:32.252   0.256170  288.86960     0       0     1.0
2017-06-12  09:46:32.844   0.052523  284.26585     0       0     1.0
2017-06-12  09:46:34.621   5.265860  260.76825     0       0     2.0
2017-06-12  09:46:35.214  12.328835  248.13245     0       0     2.0

此外,如果int需要0 strats:

data.loc[mask, 'groups'] = mask.ne(mask.shift())[mask].cumsum()
data['groups'] = data['groups'].fillna(0).astype(int) - 1
print (data)
                    Time     Flow_A     Flow_B  Open  closed  groups
2017-06-12  09:46:31.068   0.000933  295.93307     1       0      -1
2017-06-12  09:46:31.660   0.287122  292.72782     1       0      -1
2017-06-12  09:46:32.252   0.256170  288.86960     0       0       0
2017-06-12  09:46:32.844   0.052523  284.26585     0       0       0
2017-06-12  09:46:33.437   0.367495  278.39420     0       1      -1
2017-06-12  09:46:34.029   1.956472  270.84645     0       1      -1
2017-06-12  09:46:34.621   5.265860  260.76825     0       0       1
2017-06-12  09:46:35.214  12.328835  248.13245     0       0       1
2017-06-12  09:46:35.807  22.592590  232.68862     1       0      -1
2017-06-12  09:46:36.400  35.768205  214.99742     1       0      -1
2017-06-12  09:46:36.992  51.623265  195.29815     1       0      -1
2017-06-12  09:46:37.584  70.855590  174.04800     1       0      -1

print (data[mask])
                    Time     Flow_A     Flow_B  Open  closed  groups
2017-06-12  09:46:32.252   0.256170  288.86960     0       0       0
2017-06-12  09:46:32.844   0.052523  284.26585     0       0       0
2017-06-12  09:46:34.621   5.265860  260.76825     0       0       1
2017-06-12  09:46:35.214  12.328835  248.13245     0       0       1