我正在进行一项实验,我在打开和关闭时测量阀门。我有限位开关指示完全打开和完全壁橱。关闭或打开时我只对数据感兴趣。 我的pandas数据集看起来像这样(简化):
Time Flow_A Flow_B Open closed
2017-06-12 09:46:31.068 0.000933 295.933070 1 0
2017-06-12 09:46:31.660 0.287122 292.727820 1 0
2017-06-12 09:46:32.252 0.256170 288.869600 0 0
2017-06-12 09:46:32.844 0.052523 284.265850 0 0
2017-06-12 09:46:33.437 0.367495 278.394200 0 1
2017-06-12 09:46:34.029 1.956472 270.846450 0 1
2017-06-12 09:46:34.621 5.265860 260.768250 0 0
2017-06-12 09:46:35.214 12.328835 248.132450 0 0
2017-06-12 09:46:35.807 22.592590 232.688620 1 0
2017-06-12 09:46:36.400 35.768205 214.997420 1 0
2017-06-12 09:46:36.992 51.623265 195.298150 1 0
2017-06-12 09:46:37.584 70.855590 174.048000 1 0
我已经找到了如何通过python获得感兴趣的领域:
mask = (data['Open']==0 & (data['Port_2'] == 0)
data.loc[mask]
这会给我:
Time Flow_A Flow_B Open closed
2017-06-12 09:46:32.252 0.256170 288.869600 0 0
2017-06-12 09:46:32.844 0.052523 284.265850 0 0
2017-06-12 09:46:34.621 5.265860 260.768250 0 0
2017-06-12 09:46:35.214 12.328835 248.132450 0 0
问题是如何将其拆分/分组/分组/子集到两个连续的数据集中。时间段未知,日志条目之间的间隔不完全相同。我希望在面具中找到连续数据,但我不知道该怎么做。
答案 0 :(得分:0)
我认为你需要:
mask = (data['Open']==0) & (data['closed'] == 0)
data.loc[mask, 'groups'] = mask.ne(mask.shift())[mask].cumsum()
print (data)
Time Flow_A Flow_B Open closed groups
2017-06-12 09:46:31.068 0.000933 295.93307 1 0 NaN
2017-06-12 09:46:31.660 0.287122 292.72782 1 0 NaN
2017-06-12 09:46:32.252 0.256170 288.86960 0 0 1.0
2017-06-12 09:46:32.844 0.052523 284.26585 0 0 1.0
2017-06-12 09:46:33.437 0.367495 278.39420 0 1 NaN
2017-06-12 09:46:34.029 1.956472 270.84645 0 1 NaN
2017-06-12 09:46:34.621 5.265860 260.76825 0 0 2.0
2017-06-12 09:46:35.214 12.328835 248.13245 0 0 2.0
2017-06-12 09:46:35.807 22.592590 232.68862 1 0 NaN
2017-06-12 09:46:36.400 35.768205 214.99742 1 0 NaN
2017-06-12 09:46:36.992 51.623265 195.29815 1 0 NaN
2017-06-12 09:46:37.584 70.855590 174.04800 1 0 NaN
print (data[mask])
Time Flow_A Flow_B Open closed groups
2017-06-12 09:46:32.252 0.256170 288.86960 0 0 1.0
2017-06-12 09:46:32.844 0.052523 284.26585 0 0 1.0
2017-06-12 09:46:34.621 5.265860 260.76825 0 0 2.0
2017-06-12 09:46:35.214 12.328835 248.13245 0 0 2.0
此外,如果int
需要0
strats:
data.loc[mask, 'groups'] = mask.ne(mask.shift())[mask].cumsum()
data['groups'] = data['groups'].fillna(0).astype(int) - 1
print (data)
Time Flow_A Flow_B Open closed groups
2017-06-12 09:46:31.068 0.000933 295.93307 1 0 -1
2017-06-12 09:46:31.660 0.287122 292.72782 1 0 -1
2017-06-12 09:46:32.252 0.256170 288.86960 0 0 0
2017-06-12 09:46:32.844 0.052523 284.26585 0 0 0
2017-06-12 09:46:33.437 0.367495 278.39420 0 1 -1
2017-06-12 09:46:34.029 1.956472 270.84645 0 1 -1
2017-06-12 09:46:34.621 5.265860 260.76825 0 0 1
2017-06-12 09:46:35.214 12.328835 248.13245 0 0 1
2017-06-12 09:46:35.807 22.592590 232.68862 1 0 -1
2017-06-12 09:46:36.400 35.768205 214.99742 1 0 -1
2017-06-12 09:46:36.992 51.623265 195.29815 1 0 -1
2017-06-12 09:46:37.584 70.855590 174.04800 1 0 -1
print (data[mask])
Time Flow_A Flow_B Open closed groups
2017-06-12 09:46:32.252 0.256170 288.86960 0 0 0
2017-06-12 09:46:32.844 0.052523 284.26585 0 0 0
2017-06-12 09:46:34.621 5.265860 260.76825 0 0 1
2017-06-12 09:46:35.214 12.328835 248.13245 0 0 1