我想计算从数据集中停止服务器的时间长度。我知道停机时间,但不知道停机时间。
我有这个df:
index a b c reboot
2018-06-25 12:51:00 NaN NaN NaN 1
2018-06-25 12:52:00 NaN NaN NaN 0
2018-06-25 12:53:00 NaN NaN NaN 0
2018-06-25 12:54:00 NaN NaN NaN 0
2018-06-25 12:55:00 NaN NaN NaN 0
2018-06-25 12:56:00 NaN NaN NaN 0
2018-06-25 12:57:00 NaN NaN NaN 0
2018-06-25 12:58:00 NaN 0.6 0.6 0
2018-06-25 12:59:00 NaN NaN 0.5 0
2018-06-25 13:00:00 NaN NaN 0.3 0
2018-06-25 13:01:00 2.55 94.879997 0.23 0
2018-06-25 13:02:00 1.17 Nan 0.13 0
2018-06-25 13:03:00 1.08 98.199997 0.10 0
2018-06-25 13:28:00 NaN NaN NaN 1
2018-06-25 13:29:00 NaN NaN NaN 0
2018-06-25 13:30:00 NaN NaN NaN 0
2018-06-25 13:31:00 NaN NaN NaN 0
2018-06-25 13:31:00 0.5 0.2 0.1 0
2018-06-25 13:32:00 NaN NaN NaN 0
2018-06-25 13:33:00 NaN NaN NaN 0
2018-06-25 13:34:00 3 0.6 0.5 0
我要计算a
,b
和c
分别为NaN
和reboot == 1
的行,其结果采用以下形式:>
index period reboot
2018-06-25 12:51:00 7 1
2018-06-25 13:28:00 4 1
我已经尝试了在没有重新启动条件的情况下逐列进行操作。
输入:
index a b c reboot
2018-06-25 12:51:00 NaN NaN NaN 1
2018-06-25 12:52:00 NaN NaN NaN 0
2018-06-25 12:53:00 NaN NaN NaN 0
2018-06-25 12:54:00 NaN NaN NaN 0
2018-06-25 12:55:00 NaN NaN NaN 0
2018-06-25 12:56:00 NaN NaN NaN 0
2018-06-25 12:57:00 NaN NaN NaN 0
2018-06-25 12:58:00 NaN NaN NaN 0
2018-06-25 12:59:00 NaN NaN NaN 0
2018-06-25 13:00:00 NaN NaN NaN 0
2018-06-25 13:01:00 2.55 94.879997 0.23 0
2018-06-25 13:02:00 1.17 Nan 0.13 0
2018-06-25 13:03:00 1.08 98.199997 0.10 0
2018-06-25 13:28:00 NaN NaN NaN 1
2018-06-25 13:29:00 NaN NaN NaN 0
2018-06-25 13:30:00 NaN NaN NaN 0
a=df.index
b=df.b.values
idx0 = np.flatnonzero(np.r_[True, np.diff(np.isnan(b))!=0,True])
count = np.diff(idx0)
idx = idx0[:-1]
valid_mask = (count>=step) & np.isnan(b[idx])
out_idx = idx[valid_mask]
out_num = a[out_idx]
out_count = count[valid_mask]
outb = zip(out_num, out_count)
periodb=list(outb)
结果:
'[(Timestamp('2018-06-25 12:51:00'), 10),
(Timestamp('2018-06-25 13:28:00'), 3),'
答案 0 :(得分:0)
Add的另一列具有“正常”索引(整数从0开始计数),选择感兴趣的行,然后选择find the differences between adjacent values in the added column-因为这些差异将为您提供原始行之间的距离数据。
类似的东西:
numbered = df.assign(row=range(len(df)))
restarts = numbered[numbered.reboot == 1]
result = restarts.row.shift(-1) - restarts.row
(仔细一点看,问题的一部分似乎是只对所有a,b,c值都用NaN计算行。为此,请过滤掉所有 other first 行,然后添加辅助索引列。)